CONDITIONAL QUANTILE PROCESSES BASED ON SERIES OR MANY 

REGRESSORS 

ALEXANDRE BELLONI, VICTOR CHERNOZHUKOV AND IVAN FERNANDEZ- VAL 



Abstract. Quantile regression (QR) is a principal regression method for analyzing the 
impact of covariates on outcomes. The impact is described by the conditional quantile 
function and its functionals. In this paper we develop the nonparametric QR series frame- 
work, covering many regressors as a special case, for performing inference on the entire 
conditional quantile function and its linear functionals. In this framework, we approxi- 
mate the entire conditional quantile function by a linear combination of series terms with 
quantile-specific coefficients and estimate the function-valued coefficients from the data. 
We develop large sample theory for the empirical QR coefficient process, namely we ob- 
tain uniform strong approximations to the empirical QR coefficient process by conditionally 
pivotal and Gaussian processes, as well as by gradient and weighted bootstrap processes. 

We apply these results to obtain estimation and inference methods for linear function- 
als of the conditional quantile function, such as the conditional quantile function itself, its 
partial derivatives, average partial derivatives, and conditional average partial derivatives. 
Specifically, we obtain uniform rates of convergence, large sample distributions, and infer- 
ence methods based on strong pivotal and Gaussian approximations and on gradient and 
weighted bootstraps. All of the above results are for function-valued parameters, holding 
uniformly in both the quantile index and in the covariate value, and covering the pointwise 
case as a by-product. If the function of interest is monotone, we show how to use mono- 
tonization procedures to improve estimation and inference. We demonstrate the practical 
utility of these results with an empirical example, where we estimate the price elasticity 
function of the individual demand for gasoline, as indexed by the individual unobserved 
propensity for gasoline consumption. 
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1. Introduction 

Quantile regression (QR) is a principal regression method for analyzing the impact of 
covariates on outcomes, particularly when the impact might be heterogeneous. This impact 
is characterized by the conditional quantile function and its functionals [3j [23] . For exam- 
ple, we can model the log of the individual demand for some good, Y, as a function of the 
price of the good, the income of the individual, and other observed individual characteristics 
X and an unobserved preference U for consuming the good, as 

Y = Q(X,U), 

where the function Q is strictly increasing in the unobservable U. With the normalization 
that U ~ Uniform(0, 1) and the assumption that U and X are independent, the function 
Q(X, u) is the u-th conditional quantile of Y given X, i.e. Q(X,u) = Q Y \xiu\X). This 
function can be used for policy analysis. For example, we can determine how changes in 
taxes for the good could impact demand heterogeneously across individuals. 

In this paper we develop the nonparametric QR series framework for performing inference 
on the entire conditional quantile function and its linear functionals. In this framework, 
we approximate the entire conditional quantile function Qy\x( u \ x ) by a linear combination 
of series terms, Z{x)' j3{u). The vector Z(x) includes transformations of x that have good 
approximation properties such as powers, trigonometries, local polynomials, or B-splines. 
The function u i— > f3(u) contains quantile-speciflc coefficients that can be estimated from 
the data using the QR estimator of Koenker and Bassett |25j . As the number of series 
terms grows, the approximation error Qy\x( u \ x ) ~ Z{x)' j3{u) decreases, approaching zero 
in the limit. By controlling the growth of the number of terms, we can obtain consistent 
estimators and perform inference on the entire conditional quantile function and its linear 
functionals. The QR series framework also covers as a special case the so called many 
regressors model, which is motivated by many new types of data that emerge in the new 
information age, such as scanner and online shopping data. 

We describe now the main results in more detail. Let /3(-) denote the QR estimator of 
/?(•). The first set of results provides large-sample theory for the empirical QR coefficient 
process of increasing dimension y/n(f3(-) — /?(•)). We obtain uniform strong approximations 
to this process by a sequence of the following stochastic processes of increasing dimension: 

(i) a conditionally pivotal process, 

(ii) a gradient bootstrap process, 
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(iii) a Gaussian process, and 

(iv) a weighted bootstrap process. 

To the best of our knowledge, all of the above results are new. The existence of the pivotal 
approximation emerges from the special nature of QR, where a (sub) gradient of the sam- 
ple objective function evaluated at the truth is pivotal conditional on the regressors. This 
allows us to perform high-quality inference without even resorting to Gaussian approxima- 
tions. We also show that the gradient bootstrap, introduced by Parzen, Wei and Ying [30] 
in the parametric context, is effectively a means of carrying out the conditionally pivotal 
approximation without explicitly estimating Jacobian matrices. The conditions for validity 
of these two schemes require only a mild restriction on the growth of the number of series 
terms in relation to the sample size. We also obtain a Gaussian approximation to the entire 
distribution of QR process of increasing dimension by using chaining arguments and Yurin- 
skii's coupling. Moreover, we show that the weighted bootstrap works to approximate the 
distribution of QR process for the same reason as the Gaussian approximation. The condi- 
tions for validity of the Gaussian and weighted bootstrap approximations, however, appear 
to be substantively stronger than for the pivotal and gradient bootstrap approximations. 

The second set of results provides estimation and inference methods for linear functionals 
of the conditional quantile function, including 

(i) the conditional quantile function itself, (u,x) t-} Qy\x{ u \ x )-> 

(ii) the partial derivative function, (u,x) h-> d Xk QY\xi u \ x )-, 

(iii) the average partial derivative function, u h- > J d Xk QY\x( u \ x )dfJ-(x), and 

(iv) the conditional average partial derivative, {u,xj.) h-» J d Xk QY\x{ u \ x )d^{x\xk), 

where jx is a given measure and x^ is the k-th. component of x. Specifically, we derive uni- 
form rates of convergence, large sample distributions and inference methods based on the 
strong pivotal and Gaussian approximations and on the gradient and weighted bootstraps. 
It is noteworthy that all of the above results apply to function- valued parameters, holding 
uniformly in both the quantile index and the covariate value, and covering pointwise nor- 
mality and rate results as a special case. If the function of interest is monotone, we show 
how to use monotonization procedures to improve estimation and inference. 

The paper contributes and builds on the existing important literature on conditional 
quantile estimation. First and foremost, we build on the work of He and Shao [22J that 
studied the many regressors model and gave pointwise limit theorems for the QR estimator 
in the case of a single quantile index. We go beyond the many regressors model to the series 
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model and develop large sample estimation and inference results for the entire QR process. 
We also develop analogous estimation and inference results for the conditional quantile 
function and its linear functionals, such as derivatives, average derivatives, conditional 
average derivatives, and others. None of these results were available in the previous work. 
We also build on Lee [28] that studied QR estimation of partially linear models in the 
series framework for a single quantile index, and on Horowitz and Lee [23J that studied 
nonparametric QR estimation of additive quantile models for a single quantile index in a 
series framework. Our framework covers these partially linear models and additive models 
as important special cases, and allows us to perform inference on a considerably richer set 
of functionals, uniformly across covariate values and a continuum of quantile indices. Other 
very important work includes Chaudhuri [10] , and Chaudhuri, Doksum and Samarov 
Hardle, Ritov, and Song [20J, Cattaneo, Crump, and Jansson [8], and Kong, Linton, and 
Xia [27], among others, but this work focused on local, non-series, methods. 

Our work also relies on the series literature, at least in a motivational and conceptual 
sense. In particular, we rely on the work of Stone [35] . Andrews [2], Newey [29], Chen 
and Shen [T3], Chen [T2] and others that rigorously motivated the series framework as an 
approximation scheme and gave pointwise normality results for least squares estimators, 
and on Chen [12] and van de Geer [36] that gave (non-uniform) consistency and rate results 
for general series estimators, including quantile regression for the case of a single quantile 
index. White [38] established non-uniform consistency of nonparametric estimators of the 
conditional quantile function based on a nonlinear series approximation using artificial neu- 
ral networks. In contrast to the previous results, our rate results are uniform in covariate 
values and quantile indices, and cover both the quantile function and its functionals. More- 
over, we not only provide estimation rate results, but also derive a full set of results on 
feasible inference based on the entire quantile regression process. 

While relying on previous work for motivation, our results require to develop both new 
proof techniques and new approaches to inference. In particular, our proof techniques 
rely on new maximal inequalities for function classes with growing moments and uniform 
entropy. One of our inference approaches involves an approximation to the entire conditional 
quantile process by a conditionally pivotal process, which is not Donsker in general, but 
can be used for high-quality inference. The utility of this new technique is particularly 
apparent in our high-dimensional setting. Under stronger conditions, we also establish an 
asymptotically valid approximation to the quantile regression process by Gaussian processes 
using Yurinskii's coupling. Previously, [15] used Yurinskii's coupling to obtain a strong 
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approximation to the least squares series estimator. The use of this technique in our context 
is new and much more involved, because we approximate an entire empirical QR process of 
an increasing dimension, instead of a vector of increasing dimension, by a Gaussian process. 
Finally, it is noteworthy that our uniform inference results on functionals, where uniformity 
is over covariate values, do not even have analogs in the least squares series literature (the 
extension of our results to least squares is a subject of ongoing research, [5]). 

This paper does not deal with sparse models, where there are some key series terms and 
many "non-key" series terms which ideally should be omitted from estimation. In these 
settings, the goal is to find and indeed remove most of the "non-key" series terms before 
proceeding with estimation. [6] obtained rate results for quantile regression estimators in 
this case, but did not provide inference results. Even though our paper does not explic- 
itly deal with inference in sparse models after model selection, the methods and bounds 
provided herein are useful for analyzing this problem. Investigating this matter rigorously 
is a challenging issue, since it needs to take into account the model selection mistakes in 
estimation, and is beyond the scope of the present paper; however, it is a subject of our 
ongoing research. 

Plan of the paper. The rest of the paper is organized as follows. In Section [21 we describe 
the nonparametric QR series model and estimators. In Section O we derive asymptotic 
theory for the series QR process. In Section 21 we give estimation and inference theory for 
linear functionals of the conditional quantile function and show how to improve estimation 
and inference by imposing monotonicity restrictions. In Section we present an empirical 
application to the demand of gasoline and a computational experiment calibrated to the 
application. The computational algorithms to implement our inference methods and the 
proofs of the main results are collected in the Appendices. 

Notation. In what follows, g m_1 denotes the unit sphere in R m . For x £ R m , we define 
the Euclidian norm as ||x|| := sup ag>S m-i \a'x\. For a set /, diam(J) = sup^ „ eI \\v — v\\ 
denotes the diameter of /, and int(I) denotes the interior of /. For any two real numbers a 
and b, a V b = max{a, 6} and a A b = min{a, b}. Calligraphic letters are used to denote the 
support of interest of a random variable or vector. For example, IA C (0, 1) is the support of 
U, X C R d is the support of X, and Z = {Z{x) 6 W 71 : x £ X} is the support of Z = Z(X). 
The relation a n < b n means that a n < Cb n for a constant C and for all n large enough. 
We denote by P* the probability measure induced by conditioning on any realization of 
the data T> n := {(Yi,Xi) : 1 < i < n}. We say that a random variable A n = op*(l) in 
P-probability if for any positive numbers e > and r\ > 0, P{P*(|A n | > e) > rf\ = o(l), or 
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equivalently, P*(| A n | > e) = op(l). We typically shall omit the qualifier "in P-probability." 
The operator E denotes the expectation with respect to the probability measure P, E n 
denotes the expectation with respect to the empirical measure, and G n denotes y/n{K n — E). 

2. Series Framework 
2.1. The set-up. The set-up corresponds to the nonparametric QR series framework: 

Y = Qy\ x (U\X) = Z'(3(U) + R(X, U), U\X ~ Uniform(0, 1) and (3(u) G R m , 

where X is a (i-vector of elementary regressors, and Z = Z{X) is an m-vector of approxi- 
mating functions formed by transformations of X with the first component of Z equal to 
one. The function Z'/3(u) is the series approximation to Qy\x(u\X) and is linear in the 
parameter vector (3(u), which is defined below. The term R(X,u) is the approximation 
error, with the special case of R(X, u) = corresponding to the many regressors model. 
We allow the series terms Z{X) to change with the sample size n, i.e. Z(X) = Z n (X), but 
we shall omit the explicit index indicating the dependence on n. We refer the reader to 
Newey |29j and Chen |12j for examples of series functions, including (i) regression splines, 
(ii) polynomials, (hi) trigonometric series, (iv) compactly supported wavelets, and others. 
Interestingly, in the latter scheme, the entire collection of series terms is dependent upon 
the sample size. 

For each quantile index u 6 (0, 1), the population coefficient /3(u) is defined as a solution 

to 

rnmE[ Pu (Y-Z>p)] (2.1) 

where p u (z) = (u — l{z < 0})z is the check function (|24j). Thus, this coefficient is not 
a solution to an approximation problem but to a prediction problem. However, we show 
in Lemma [T] in Appendix [B] that the solution to (|2.ip inherits important approximation 
properties from the best least squares approximation of Qy\x{ u \') by a linear combination 
ofZ(.). 

We consider estimation of the coefficient function u t— >■ f3(u) using the entire quantile 
regression process over U, a compact subset of (0, 1), 



{u i-)- P(u),u G U}, 
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namely, for each u £ U, the estimator /3(u) is a solution to the empirical analog of the 
population problem (|2. lj) 

minE n [p u (l--Z^)]. (2.2) 

We are also interested in various functionals of Qy\x{'\ x )- We estimate them by the corre- 
sponding functionals of Z(x)'/3(-), such as the estimator of the entire conditional quantile 
function, (x, u) H > Z(x)'(3(u), and derivatives of this function with respect to the elementary 
covariates x. 

2.2. Regularity Conditions. In our framework the entire model can change with n, al- 
though we shall omit the explicit indexing by n. We will use the following primitive as- 
sumptions on the data generating process, as n — > oo and m = m(n) — > oo. 

Condition S. 

5.1 The data T> n = {(Yi, X^)' , 1 < i < n} are an i.i.d. sequence of real (1 + d)-vectors, 
and Zi = Z(Xi) is a real m-vector for i = 1, . . . , n. 

5.2 The conditional density of the response variable fY\x(y\ x ) ^ s bounded above by f 
and its derivative in y is bounded above by f, uniformly in the arguments y and 
x £ X and in n; moreover, fy\x{QY\x { u \ x )\ x ) ^ s bounded away from zero uniformly 
for all arguments u G U , x € X , and n. 

5.3 For every m, the eigenvalues of the Gram matrix S m = E[ZZ'\ are bounded from 
above and away from zero, uniformly in n. 

5.4 The norm of the series terms obeys maxj< n \\Z{\\ < ((m,d,n) := Cm- 

5.5 The approximation error term R(X, U) is such that sup a . 6 ^ u6W \R(x,u)\ < rrT K . 

Comment 2.1. Condition S is a simple set of sufficient conditions. Our proofs and lem- 
mas gathered in the appendices hold under more general, but less primitive, conditions. 
Condition S.2 impose mild smoothness assumptions on the conditional density function. 
Condition S.3 and S.4 imposes plausible conditions on the design, see Newey [29]. Condi- 
tion S.2 and S.3 together imply that the eigenvalues of the Jacobian matrix 

J m {u) = E\J y \x{Qy\x{u\X)\X)ZZ'\ 

are bounded away from zero and from above; Lemma [12] further shows that this together 
with S.5 implies that the eigenvalues of J m {u) = E{f Y \x(Z' P(u)\X)ZZ'] are bounded away 
from zero and from above, since the two matrices are uniformly close. Assumption S.4 
imposes a uniform bound on the norm of the transformed vector which can grow with 
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the sample size. For example, we have ( m = \/m for splines and Q m = m for polynomials. 
Assumption S.5 introduces a bound on the approximation error and is the only nonprimitive 
assumption. Deriving sharp bounds on the sup norm for the approximation error is a 
delicate subject of approximation theory even for least squares, with the exception of local 
polynomials; see, e.g. [5] for a discussion in the least squares case. However, in Lemma Q] 
in Appendix [Bj we characterize the nature of the approximation by establishing that the 
vector f3(u) solving (12.ip is equivalent to the least squares approximation to Qy\x{ u \%) in 
terms of the order of L2 error. We also deduce a simple upper bound on the sup norm 
for the approximation error. In applications, we recommend to analyze the size of the 
approximation error numerically. This is possible by considering examples of plausible 
functions Qy\x( u \ x ) an d comparing them to the best linear approximation schemes, as this 
directly leads to sharp results. We give an example of how to implement this approach in 
the empirical application. 

3. Asymptotic Theory for QR Coefficient Processes based on Series 

3.1. Uniform Convergence Rates for Series QR Coefficients. The first main result 
is a uniform rate of convergence for the series estimators of the QR coefficients. 

Theorem 1 (Uniform Convergence Rate for Series QR Coefficients). Under Condition S, 
and provided that C^mlogn = o(n), 



Thus, up to a logarithmic factor, the uniform convergence over ti is achieved at the 
same rate as for a single quantile index ([22]). The proof of this theorem relies on new 
concentration inequalities that control the behavior of the empirical eigenvalues of the design 
matrix. 

3.2. Uniform Strong Approximations to the Series QR Process. The second main 
result is the approximation of the QR process by a linear, conditionally pivotal process. 

Theorem 2 (Strong Approximations to the QR Process by a Pivotal Coupling). Under 
Condition S, m 3 (^log n = o(n), and m _ft+1 log 3 n = o(l), the QR process is uniformly 
close to a conditionally pivotal process, namely 
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where 

1 - 

V n (u):=—Y,Zi{u-l{Ui<u}), (3.3) 



n 

»=i 



where U\, . . . ,U n are i.i.d. Uniform(0, 1), independently distributed of Z%,. . . , and 

i^log 3 / 4 ' 



sup ||r n (u)|| < P 1 ^- rr V v / m 1 - K logn = o(l/logn). 

The theorem establishes that the QR process is approximately equal to the process 
J~ 1 (-)U n (-), which is pivotal conditional on Z\,...,Z n . This is quite useful since we can 
simulate U n (-) in a straightforward fashion, conditional on Z\, ...,Z n , and we can estimate 
the matrices J m (-) using Powell's method |32| . Thus, we can perform inference based on the 
entire empirical QR process in series regression or many regressors contexts. The following 
theorem establishes the formal validity of this approach, which we call the pivotal method. 

Theorem 3 (Pivotal Method). Let J m (u) denote the estimator of J m (u) defined in (3.7) 
with bandwidth h n obeying h n = o(l) and h n ^/m\og^^ 2 n = o(l). Under Condition S, 
m -K+i/2 } Q g3/2 n _ (i) ? an & C^jn 2 log 4 n = o(nh n ), the feasible pivotal process J^i'^ni') 
correctly approximates a copy J~ 1 (-)U*(-) of the pivotal process defined in Theorem^ 

Jm( u Wn( u ) = J m{ u Wni u ) + r n{u), 

where 

Ui, . . . , U* are i.i.d. Uniform(0, 1), independently distributed of Z\, . . . , Z n , and 



sup||r„(u)|| < P \ r mm , lQg " + m~ K+l / 2 ^\ogn + h n ^m\ogn = o(l/\og 



ueu 



This method is closely related to another approach to inference, which we refer to here as 
the gradient bootstrap method. This approach was previously introduced by Parzen, Wei 
and Ying [30] for parametric models with fixed dimension. We extend it to a considerably 
more general nonparametric series framework. The main idea is to generate draws /?*(•) of 
the QR process as solutions to QR problems with gradients perturbed by a pivotal quantity 
U*(-)/\/n. In particular, let us define a gradient bootstrap draw f3*(u) as the solution to 
the problem 

min EMY, - Z'M - W n {u)'P/yfc, (3.5) 

p£JR m 
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for each u £ U, where U^(-) is defined in (|3.4p . The problem is solved many times for 
independent draws of U* (•), and the distribution of y / n(/3(-) — /3(-)) is approximated by the 
empirical distribution of the bootstrap draws of yjn(fi*{-) — /?(•))• 

Theorem 4 (Gradient Bootstrap Method). Under Condition S, (^m 3 log 7 n = o{n), and 
m -n+i/2 log 3 / 2 n = o(l), the gradient bootstrap process correctly approximates a copy J~ 1 (-)U* (•) 
of the pivotal process defined in Theorem 



fiH \f(u) - P(u)j = J-\u)V* n {u) + r n {u), 
where U* (u) is defined in ^3.4\ ) and 



m 3 ^(m log 3/4 n _ K /— 

sup \\r n {u)\\ < P -j-. Ym y/mlogn = o(l/logra). 

The stated bound continues to hold in P -probability if we replace the unconditional probability 
P by the conditional probability P* . 

The main advantage of this method relative to the pivotal method is that it does not 
require to estimate J m (-) explicitly. The disadvantage is that in large problems the gradient 
bootstrap can be computationally much more expensive. 

Next we turn to a strong approximation based on a sequence of Gaussian processes. 

Theorem 5 (Strong Approximation to the QR Process by a Gaussian Coupling). Under 
conditions S.l-S.4 and m 7 (^ l log 22 n = o(n), there exists a sequence of zero-mean Gaussian 
processes G n {-) with a.s. continuous paths, that has the same covariance functions as the 
pivotal process U n (-) in j3.4\) conditional on Z±, . . . , Z n , namely, 



E[G n (u)G n (u)'] = E[U n (n)U n (u)'] = K n [ZiZl](u A u — uu), for all u and u G U. 
Also, G n (-) approximates the empirical process U n (-), namely, 

sup||U n (u)-G n («)|| < P o(l/ log n). 

Consequently, if in addition S.5 holds with m _ft+1 log 3 n = o(l), 

sup \\VH(p(u) - P(u)) - J m \u)G n {u)\\ < P (l/logn). 
ueu 

Moreover, under the conditions of Theorem^ the feasible Gaussian process </m 1 ( - )Gn( - ) 
correctly approximates J~ 1 (-)G n (-); 

J^-(u)G n {u) = J~ 1 (u)G n (?x) + r n (u), 
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where 

sup||r„(«)|| <p \ — — j-2 — + m' K+1/2 y / logn + h n ^/m\ogn = o(l/log n). 

u&A V ntln 

The requirement on the growth of the dimension m in Theorem [5] is a consequence 
of a step in the proof that relies upon Yurinskii's coupling. Therefore, improving that 
step through the use of another coupling could lead to significant improvements in such 
requirements. Note, however, that the pivotal approximation has a much weaker growth 
restriction, and it is also clear that this approximation should be more accurate than any 
further approximation of the pivotal approximation by a Gaussian process. 

Another related inference method is the weighted bootstrap for the entire QR process. 
Consider a set of weights h\,...,h n that are i.i.d. draws from the standard exponential 
distribution. For each draw of such weights, define the weighted bootstrap draw of the QR 
process as a solution to the QR problem weighted by hi, . . . , h n : 

f3 b (u) 6 arg min E n [h iPu (Yi - Ztf)), for u G U. 



The following theorem establishes that the weighted bootstrap distribution is valid for 
approximating the distribution of the QR process. 

Theorem 6 (Weighted Bootstrap Method). (1) Under Condition S, (^m 3 log 9 n = o(n), 
and m _K+1 log 3 n = o(l), the weighted bootstrap process satisfies 



(p b (u) - P(uj) = - l)Zi(u - l{Ui 

* i=l 



< u}) + r n (u), 



where U\, . . . , U n are i.i.d. Uniform(0, 1), independently distributed of Z\, . . . , Z n , and 

^ 3 / 4 d /2 log 5/4 ^ 



sup ||r n (u)|| < P ^—j- h \J m 1 ~ K log n = o(l/logn). 

The bound continues to hold in P -probability if we replace the unconditional probability P 
by P*. 

(2) Furthermore, under the conditions of Theorem^ the weighted bootstrap process ap- 
proximates the Gaussian process J!~ 1 (-)G n (-) defined in Theorem^ that is: 

sup HVH^V) - Au)) - J^{u)G n {u)\\ < P o(l/logn). 

In comparison with the pivotal and gradient bootstrap methods, the Gaussian and 
weighted bootstrap methods require stronger assumptions. 
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3.3. Estimation of Matrices. In order to implement some of the inference methods, we 
need uniformly consistent estimators of the Gram and Jacobian matrices. The natural 
candidates are 

S m = E n [ZX], (3.6) 
Jm{u) = ^-E n [l{\Y t - Z0(u)\ < h n } ■ ZiZl], (3.7) 

where h n is a bandwidth parameter, such that h n — > 0, and u G U. The following result 
establishes uniform consistency of these estimators and provides an appropriate rate for the 
bandwidth h n which depends on the growth of the model. 

Theorem 7 (Estimation of Gram and Jacobian Matrices). // conditions S. 1-S.4 and Q n log n = 
o{n) hold, then S m — S m = op(l) in the eigenvalue norm . If conditions S.1-S.5, h n = o(l), 
andmC^logn = o{nh n ) hold, then J m {u) — J m {u) = op(l) in the eigenvalue norm uniformly 
in u 6W. 

4. Estimation and Inference Methods for Linear Functionals 

In this section we apply the previous results to develop estimation and inference methods 
for various linear functionals of the conditional quantile function. 

4.1. Functionals of Interest. We are interested in various functions 9 created as linear 
functionals of Qy\x{ u Y) f° r au u £W. Particularly useful examples include, for x = (w,v) 
and Xk denoting the A;-th component of x, 

1. the function itself: 6(u,x) = Qy\x{ u \ x )\ 

2. the derivative: 6(u, x) = d Xk Q Y \x( u \ x )'i 

3. the average derivative: 9(u) = f d Xk QY\x{ u \ x )dl J '{ x )'i 

4. the conditional average derivative: 9{u\w) = J d Xk QY\x{ u \ w -, v )d^{v\w). 

The measures [i entering the definitions above are taken as known; the result can be extended 
to include estimated measures. 

Note that in each example we could be interested in estimating 9{u,w) simultaneously 
for many values of (u, w). For w £ W C M. d , let / C U x W denote the set of indices (u, w) 
of interest. For example: 

• the function at a particular point (u,w), in which case / = {(u,w)}, 

• the function u >->■ 9(u, w), having fixed w, in which case I = 14 x {w}, 

• the function w >->■ 9(u, w), having fixed u, in which case / = {u} x W, 
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• the entire function (u, w) t— >■ 9(u, w), in which case I = U x W. 

By the linearity of the series approximations, the above parameters can be seen as linear 
functions of the quantile regression coefficients /3(u) up to an approximation error, that is 

9(u,w)=£(wYP(u)+r n {u,w), (u,w)el, (4.8) 

where £(w)'/3(u) is the series approximation, with £{w) denoting the m-vector of loadings on 
the coefficients, and r n (u, w) is the remainder term, which corresponds to the approximation 
error. Indeed, in each of the examples above, this decomposition arises from the application 
of different linear operators A to the decomposition Qy\x( u \') = Z{')'P(. U ) + R(u, •) and 
evaluating the resulting functions at w. 

(AQ Ylx (u\-)) [w] = (AZ(-)) M'/3(u) + (AR(u,-)) [w]. (4.9) 

In the four examples above the operator A is given by, respectively, 

1. the identity operator: (Af)[x] = f[x], so that 

l(x) = Z(x), r n (u,x) = R{u,x) ; 

2. a differential operator: (Af)[x] = (d Xk f)[x], so that 

£(x) = d Xk Z(x), r n (u,x) = d Xk R(u,x) ; 

3. an integro-differential operator: Af = J d Xk f(x)d/i(x), so that 
d Xk Z(x)dfj,(x), r n {u) = j d Xk R(u,x)dfi(x) ; 

4. a partial integro-differential operator: (^l/)[it;] = j d Xk f(x)d/i(v\w), so that 
£(w) = / d Xk Z(x)dfx(v\w), r n (u,w) = / d Xk R(u,x)d^(v\w) . 



For notational convenience, we use the formulation fj4.8|) in the analysis, instead of the 
motivational formulation (14.91). 



We shall provide the inference tools that will be valid for inference on the series approx- 
imation 

£(w)'(3(u), (u,w)€l, 

and, provided that the approximation error r n (u,w), (u,w) € /, is small enough as com- 
pared to the estimation noise, these tools will also be valid for inference on the functional 
of interest: 

9(u, w), (u, w) G /. 
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Therefore, the series approximation is an important penultimate target, whereas the func- 
tional 9 is the ultimate target. The inference will be based on the plug- in estimator 
9(u, w) := £{w)' f3(u) of the the series approximation £(w)'/3(u) and hence of the final target 
9(u, w). 



4.2. Pointwise Rates and Inference on the Linear Functionals. Let (u, w) be a pair 
of a quantile index value u and a covariate value w, at which we are interested in performing 
inference on 9(u,w). In principle, this deterministic value can be implicitly indexed by n, 
although we suppress the dependence. 

Condition P. 

P.l The approximation error is small, namely ^/n\r n (u, w)\/\\£(w)\\ = o(l). 
P.2 The norm of the loading £{w) satisfies: \\£(w)\\ < ^e(m,w). 

Theorem 8 (Pointwise Convergence Rate for Linear Functionals). Assume that the condi- 
tions of Theorem [H and Condition P hold, then 

\9(u,w)-9(u,w)\< P ^l. (4.10) 

In order to perform inference, we construct an estimator of the variance as 

a^(u,w) = u(l — u)£(w)' J^ l 1 (u)T lm J^ l 1 (u)£(w)/n. (4-11) 

Under the stated conditions this quantity is consistent for 

o~n(u,w) = u(l — u)£(w)' J^ l 1 (u)T lm J^ l 1 (u)£(w)/n. (4-12) 

Finally, consider the t-statistic: 

9(u,w) - 9(u,w) 
t n (u,w) = — ^ . 

Condition P also ensures that the approximation error is small, so that 

£(wY0(u)-f3(u)) 

t n (u,w) = — h Op(l). 

a n (u,w) 

We can carry out standard inference based on this statistic because t n (u,w) — >d N(0, 1). 
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Moreover, we can base inference on the quantiles of the following statistics: 

i{wyj m \u)vi{u)/^i 



pivotal coupling t* n {u,w) 
gradient bootstrap coupling t* n {u,w) 
weighted bootstrap coupling t* n {u,w) 



a n (u,w) 

l{w)'0*{u)-p{u)) ^ (4-13) 
a n (u,w) 

£(w)'0 b (u) - 0(u)) 
a n (u,w) 

Accordingly let k n (l — a) be the 1 — a/2 quantile of the standard normal variable, or let 
k n (l — a) denote the 1 — a quantile of the random variable \t^(u, w)\ conditional on the data, 
i.e. k n (l — a) = mf{t : P(|i*(t(, w)\ < t\V n ) > 1 — a}, then a pointwise (1 — a)-confidence 
interval can be formed as 

[L(u, w), l(u, w)] = [9(u, w) — k n {\ — a)a n (u, w), 9(u, w) + k n (l — a)a n (u, w)]. 

Theorem 9 (Pointwise Inference for Linear Functionals) . Suppose that the conditions of 
Theorems^ and^ and Condition P hold. Then, 

t n {u,w) -> d iV(0,l). 

For the case of using the gradient bootstrap method, suppose that the conditions of Theorem 
[7] hold. For the case of using the weighted bootstrap method, suppose that the conditions of 
Theorem^ hold. Then, t^(u,w) — >d N(0, 1) conditional on the data. 

Moreover, the (1 — a) -confidence band [i(u,w),'C(u,w)] covers 0(u,w) with probability that 
is asymptotically no less than 1 — a, namely 

P^6(u,w) £ [i(u,w),i(u,w)]\ >l-a + o(l). 

4.3. Uniform Rates and Inference on the Linear Functionals. For / : / i— > R fc , 

define the norm 

||/||j:= sup \\f(u,w)\\. 

We shall invoke the following assumptions to establish rates and uniform inference results 
over the region /. 

Condition U. 

U.l The approximation error is small, namely y/n log n sup ||r n (u,to)/||^(«;)|||| = o(l). 

(u,w)£l 
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U.2 The loadings £(w) are uniformly bounded and admit Lipschitz coefficients £g(m,I), 
that is, 

\\i\\l<Um,I), \\£(w)-e(w')\\<^(m,I)\\w-w% and 

log[diam(/) V 6(771,1) V $(m,I) V Cm] < logm 

Theorem 10 (Uniform Convergence Rate for Linear Functionals) . Assume that the condi- 
tions of Theorem^ and Condition U hold, and d£g(m, I) Cm log 2 n = o(n), then 

S u P{WtU)eI \8(u,w)-8(u,w)\ < P M^iiogn. (4.14) 

As in the pointwise case, consider the estimator of the variance cx^(u, w) in (|4.1ip . Under 
our conditions this quantity is uniformly consistent for a^(u,w) defined in (|4.12|) . namely 
a^(u,w)/a^(u,w) = 1 + op(l/logn) uniformly over (u,w) G I. Then we consider the 
t-statistic process: 

{, s 8(u,w) — 6(u,w) , \ A 
tn(u, W) = V V, Y—!-, (U,w)el\. 
a n (u,w) j 

Under our assumptions the approximation error is small, so that 

t n (u,w) = ( - ' I) ' P " + op 1/ log n) in £°°(I). 

The main result on inference is that the t-statistic process can be strongly approximated 
by the following pivotal processes or couplings: 



. +1 r /,*, v e(w)'j m \u)w n (u)/^i x 

pivotal coupling < t (u,w) = r , (u,w) € 1 >; 

L ' a n (u,w) ) 

,. i ,. ( *, n £(wy0*(u)-p(u)) , . rl 

gradient bootstrap coupling < t iu,w) = ^— -. r , \u,w) 6i ; 

I ' a n [u,w) J 

^ i. f */ x £{w)' J m l (u)G Tl {u)l Jn i 

Gaussian coupling < t n (u,w) = ^— ; r , (u,iu) 6 i ; 

I a n (u,w) y 

£(w)'(P b (u)-P( U )) 



weighted bootstrap coupling < t n (u,w) = ^— r , fit, w) 6 1 }. 

I a n (u,w) ) 

(4.15) 

The following theorem shows that these couplings approximate the distribution of the t- 
statistic process in large samples. 
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Theorem 11 (Strong Approximation of Inferential Processes by Couplings). Suppose that 
the conditions of Theorems \M and and Condition U hold. For the case of using the 
gradient bootstrap method, suppose that the conditions of Theorem [7] hold. For the case of 
using the Gaussian approximation, suppose that the conditions of Theorem [5| hold. For the 
case of using the weighted bootstrap method, suppose that the conditions of Theorem\^hold. 
Then, 

t n (u,w) = d t* n (u,w) +op(l/logn), in £°°(I) : 
where P can be replaced by P* . 

To construct uniform two-sided confidence bands for {9(u,w) : (u,w) G /}, we consider 
the maximal t-statistic 

\\t n \\i = sup \t n (u,w)\, 
as well as the couplings to this statistic in the form: 

IICII/= SU P K( u i w )\- 

(u,w)£l 

Ideally, we would like to use quantiles of the first statistic as critical values, but we do not 
know them. We instead use quantiles of the second statistic as large sample approximations. 
Let k n (l — a) denote the 1 — a quantile of random variable \\t^\\i conditional on the data, 
i.e. 

k n (l -a)= inf{t : P(||£||j < t\V n ) > 1 - a}. 

This quantity can be computed numerically by Monte Carlo methods, as we illustrate in 
the empirical section. 

Let S n > be a finite sample expansion factor such that 5 n log 1//2 n — > but 5 n log n — > oo. 
For example, we recommend to set 5 n — 1/(4 log^^ n). Then for c n ( \ — o?) — k n { \ — 01} -\- 5 n 
we define the confidence bands of asymptotic level 1 — a to be 

[l(u,w),l(u,w)] = [6(u,w) - c n (l - a)a n (u,w), 6(u,w) + c n (l - a)a n (u, w)], (u,w) G /. 

The following theorem establishes the asymptotic validity of these confidence bands. 

Theorem 12 (Uniform Inference for Linear Functionals) . Suppose that the conditions of 
Theorem [71] hold for a given coupling method. 
(1) Then 

P{\\t n \\i < c„(l - a)} > 1 - a + o(l). (4.16) 
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(2) As a consequence, the confidence bands constructed above cover 6(u,w) uniformly for 
all (u, w) G I with probability that is asymptotically no less than 1 — a, namely 

p{0(u, w) G [i(u, w),l(u, w)], for all (u, w) G /} > 1 - a + o(l). (4.17) 

("3) The width of the confidence band 2c n (l — a)a n (u,w) obeys uniformly in (u,w) G I: 

2c n (l-a)a n {u,w) = 2k n (l - a)(l + o P (l))a n (u, w). (4.18) 

Furthermore, if does no£ concentrate at k n (l — a) at a rate faster than y/log n, 

that is, it obeys the anti- concentration property P(||t* ||/ < k n (l — a) + e n ) = 1 — a + o(l) 
for any e n = o(l/\/logn), i/ien i/ie inequalities in \4-l(j^ and \4-17\ ) hold as equalities, and 
the finite sample adjustment factor 5 n could be set to zero. 

The theorem shows that the confidence bands constructed above maintain the required 
level asymptotically and establishes that the uniform width of the bands is of the same 
order as the uniform rate of convergence. Moreover, under anti-concentration the confidence 
intervals are asymptotically similar. 

Comment 4.1. This inferential strategy builds on [15], who proposed a similar strategy 
for inference on the minimum of a function. The idea was not to find the limit distribution, 
which may not exist in some cases, but to use distributions provided by couplings. Since the 
limit distribution needs not exist, it is not immediately clear that the confidence intervals 
maintain the right asymptotic level. However, the additional adjustment factor 5 n assures 
the right asymptotic level. A small price to pay for using the adjustment 5 n is that the 
confidence intervals may not be similar, i.e. remain asymptotically conservative in coverage. 
However, the width of the confidence intervals is not asymptotically conservative, since 5 n 
is negligible compared to k n (l — a). If an additional property, called anti-concentration, 
holds, then the confidence intervals automatically become asymptotically similar. The anti- 
concentration property holds if, after appropriate scaling by some deterministic sequences 
a n and b n , the inferential statistic a n (||i n ||j — b n ) has a continuous limit distribution. More 
generally, it holds if for any subsequence of integers {n^} there is a further subsequence {rik r } 
along which a nkr (\\tn kr \\i — b nkr ) has a continuous limit distribution, possibly dependent on 
the subsequence. For an example of the latter see [9], where certain inferential processes 
converge in distribution to tight Gaussian processes subsubsequentially. We expect anti- 
concentration to hold in our case, but our constructions and results do not critically hinge 
on it. 
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Comment 4.2 (Primitive Bounds on £g(m, I)). The results of this section rely on the quan- 
tity £g(m,I). The value of £$(m,I) depends on the choice of basis for the series estimator 
and on the type of the linear functional. Here we discuss the case of regression splines and 
refer to [29] and |12j for other choices of basis. After a possible renormalization of X, we 
assume its support is X = [— 1, l] d . For splines it has been established that C, m < \[m and 
maxfc supj.g^ ||9 Xfc Z(a;)|| < m 1 / 2+fc , [29]. Then we have that for 

• the function itself: 6(u,x) = Q Y \x{ u \ x )i ^( x ) = Z(x), Ce( m ,I) ^ \pm\ 

• the derivative: 8(u,x) = d Xk Q Y \x{ u \ x )i K x ) = d x Z(x), £g(m,I) < m 3 / 2 ; 

• the average derivative: 0(u) = f d Xk Qy^x(u\x)d^(x), supp(//) C inkX, \d Xk /j,(x)\ < 1, 

£ = J d Xk Z(x)n(x) dx = - f Z(x)d Xk n(x) dx, £e(m) < 1. 

4.4. Imposing Monotonicity on Linear Functionals. The functionals of interest might 
be naturally monotone in some of their arguments. For example, the conditional quantile 
function is increasing in the quantile index and the conditional quantile demand function 
is decreasing in price and increasing in the quantile index. Therefore, it might be desirable 
to impose the same requirements on the estimators of these functions. 

Let 9(u,w), where (u,w) G /, be a weakly increasing function in (u,w), i.e. 8(u',w') < 
8(u,w) whenever (u',w') < (u,w) componentwise E Let 8 and [i, i'] be the point and band 
estimators of 8, constructed using one of the methods described in the previous sections. 
These estimators might not satisfy the monotonicity requirement due to either estimation 
error or imperfect approximation. However, we can monotonize these estimates and perform 
inference using the following method, suggested in |14j . 

Let q, f : I i— >• K, where K is a bounded subset of M., and consider any monotonization 
operator M that satisfies: (1) a monotone-neutrality condition 

A4q = q if g monotone; (4.19) 

(2) a distance-reducing condition 

\\Mq-Mf\\i< \\q-f\\l) (4.20) 

and (3) an order-preserving condition 

q<f implies Mq < Mf. (4.21) 

Examples of operators that satisfy these conditions include: 

■*"If 8(u, w) is decreasing in w, we take the transformation w = —w and 9{u, w) — 6(u, —w), where 9{u, w) 
is increasing in w. 
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1. multivariate rearrangement |14j . 

2. isotonic projection [4], 

3. convex combinations of rearrangement and isotonic regression |14j . and 

4. convex combinations of monotone minorants and monotone majorants. 

The following result establishes that monotonizing point estimators reduces estimation er- 
ror, and that monotonizing confidence bands increases coverage while reducing the length 
of the bands. The following result follows from Theorem [12] using the same arguments as 
in Propositions 2 and 3 in |l4 



Corollary 1 (Inference Monotone Linear Functionals) . Let 6 : 1 1— >• K be weakly increasing 
over I and 6 be the QR series estimator of Theorem\l(K If M satisfies the conditions (4-19 ) 



and ^4.20 ), then the monotonized QR estimator is necessarily closer to the true value: 

\\M0-6\\i < ||0-0||j. 

Let [L, 1} be a confidence band for of Theorem \12\ If Ai satisfies the conditions ft4.19\ ) 
and \4-%l h th e monotonized confidence bands maintain the asymptotic level of the original 



intervals: 

P{9(u,w) G [ML(u,w),M'C(u, w)] : (u, w) € 1} > 1 - a + o(l) 



If M satisfies the condition fijjffi ), the monotonized confidence bands are shorter in length 



than the original monotone intervals: 

\\Ml - ML\\i < - i\\i- 

Comment 4.3. Another strategy, as mentioned e.g. in p3], is to simply intersect the initial 
confidence interval with the set of monotone functions. This is done simply by taking the 
smallest majorant of the lower band and the largest minorant of the upper band. This, in 
principle, produces shortest intervals with desired asymptotic level. However, this may not 
have good properties under misspecification, i.e. when approximation error is not relatively 
small (we do not analyze such cases in this paper, but they can occur in practice), whereas 
the strategy explored in the corollary retains a number of good properties even in this 
case. See [2] for a detailed discussion of this point and for practical guidance on choosing 
particular monotonization schemes. 
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5. Examples 

This section illustrates the finite sample performance of the estimation and inference 
methods with two examples. All the calculations were carried out with the software R 
(|33j). using the package quantreg for quantile regression ([26]). 

5.1. Empirical Example. To illustrate our methods with real data, we consider an empir- 
ical application on nonparametric estimation of the demand for gasoline. |21| . |34j . and [39] 
estimated nonparametrically the average demand function. We estimate nonparametrically 
the quantile demand and elasticity functions and apply our inference methods to construct 
confidence bands for the average quantile elasticity function. We use the same data set as 
in [39] , which comes from the National Private Vehicle Use Survey, conducted by Statistics 
Canada between October 1994 and September 1996. The main advantage of this data set, 
relative to similar data sets for the U.S., is that it is based on fuel purchase diaries and 
contains detailed household level information on prices, fuel consumption patterns, vehicles 
and demographic characteristics. (See [39] for a more detailed description of the data.) 
Our sample selection and variable construction also follow [39]. We select into the sample 
households with non-zero licensed drivers, vehicles, and distance driven. We focus on regu- 
lar grade gasoline consumption. This selection leaves us with a sample of 5,001 households. 
Fuel consumption and expenditure are recorded by the households at the purchase level. 

We consider the following empirical specification: 

Y = Q Ylx (U\X), Q Y]x (U\X)=g(W,U) + Vp(U), X = (W,V), 

where Y is the log of total gasoline consumption in liters per month; W is the log of 
price per liter; U is the unobservable preference of the household to consume gasoline; 
and V is a vector of 28 covariates. Following [35], the covariate vector includes the log 
of age, a dummy for the top coded value of age, the log of income, a set of dummies for 
household size, a dummy for urban dwellers, a dummy for young-single (age less than 36 and 
household size of one), the number of drivers, a dummy for more than 4 drivers, 5 province 
dummies, and 12 monthly dummies. To estimate the function g(W, U), we consider three 
series approximations in W: linear, a power orthogonal polynomial of degree 6, and a cubic 
B-spline with 5 knots at the {0,1/4,1/2,3/4,1} quantiles of the observed values of W. 
The number of series terms is selected by undersmoothing over the specifications chosen 
by applying least squares cross validation to the corresponding conditional average demand 
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functions. In the next section, we analyze the size of the specification error of these series 
approximations in a numerical experiment calibrated to mimic this example. 

The empirical results are reported in Figures HH31 Fig. [T] plots the initial and rearranged 
estimates of the quantile demand surface for gasoline as a function of price and the quantile 
index, that is 



where the value of v is fixed at the sample median values of the ordinal variables and one 
for the dummies corresponding to the sample modal values of the rest of the variables^ The 
rows of the figure correspond to the three series approximations. The monotonized estimates 
in the right panels are obtained using the average rearrangement over both the price and 
quantile dimensions proposed in [14] , The power and B-spline series approximations show 
most noticeably non-monotone areas with respect to price at high quantiles, which are 
removed by the rearrangement. 

Fig. [2] shows series estimates of the quantile elasticity surface as a function of price and 
the quantile index, that is: 



The estimates from the linear approximation show that the elasticity decreases with the 
quantile index in the middle of the distribution, but this pattern is reversed at the tails. 
The power and B-spline estimates show substantial heterogeneity of the elasticity across 
prices, with individuals at the high quantiles being more sensitive to high prices % 

Fig. [3] shows 90% uniform confidence bands for the average quantile elasticity function 



The rows of the figure correspond to the three series approximations and the columns 
correspond to the inference methods. We construct the bands using the pivotal, Gaussian 
and weighted bootstrap methods. For the pivotal and Gaussian methods the distribution 
of the maximal t-statistic is obtained by 1,000 simulations. The weighted bootstrap uses 
standard exponential weights and 199 repetitions. The confidence bands show that the 

2 The median values of the ordinal covariates are $40A" for income, 46 for age, and 2 for the number of 
drivers. The modal values for the rest of the covariates are for the top-coding of age, 2 for household size, 
1 for urban dwellers, for young-single, for the dummy of more than 4 drivers, 4 (Prairie) for province, 
and 11 (November) for month. 

^These estimates are smoothed by local weighted polynomial regression across the price dimension ([E]), 
because the unsmoothed elasticity estimates display very erratic behavior. 



(it, exp(u>)) i y 0(u,w) = exp(g(w,u) +v'/3(u)) 



(u,exp(w)) i y 0(u,w) = d w g(w,u). 




Linear 



Rearranged 
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B-splines Rearranged 




Figure 1 . Quantile demand surfaces for gasoline as a function of price and 
the quantile index. The left panels display linear, power and B-spline series 
estimates and the right panels shows the corresponding estimates mono- 
tonized by rearrangement over both dimensions. 
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evidence of heterogeneity in the elasticities across quantiles is not statistically significant, 
because we can trace a horizontal line within the bands. They show, however, that there is 
significant evidence of negative price sensitivity at most quantiles as the bands are bounded 
away from zero for most quantiles. 

5.2. Numerical Example. To evaluate the performance of our estimation and inference 
methods in finite samples, we conduct a Monte Carlo experiment designed to mimic the 
previous empirical example. We consider the following design for the data generating pro- 
cess: 

Y = g(W) + V'P + a^- 1 (U), (5.22) 

where g(w) = Qo + ol\W + «2 sin(27ru>) + 03 cos(27ru>) + a4sin(47ruj) + cos(47ru>), V is 
the same covariate vector as in the empirical example, U ~ £7(0,1), and denotes the 
inverse of the CDF of the standard normal distribution. The parameters of g(w) and /3 
are calibrated by applying least squares to the data set in the empirical example and a is 
calibrated to the least squares residual standard deviation. We consider linear, power and 
B-spline series methods to approximate g(w), with the same number of series terms and 
other tuning parameters as in the empirical example 

Figures H] and examine the quality of the series approximations in population. They 
compare the true quantile function 

(u, exp(u))) 1 y 9(u, w) = g(w) + v'/3 + <r$ _1 (n), 

and the quantile elasticity function 

(u,exp(w)) H> 9(u,w) = d w g(w), 

to the estimands of the series approximations. In the quantile demand function the value of 
v is fixed at the sample median values of the ordinal variables and at one for the dummies 
corresponding to the sample modal values of the rest of the variables. The estimands are 
obtained numerically from a mega-sample (a proxy for infinite population) of 100 x 5, 001 
observations with the values of (W, V) as in the data set (repeated 100 times) and with 
Y generated from the DGP (|5.22[) . Although the derivative function does not depend on 
u in our design, we do not impose this restriction on the estimands. Both figures show 
that the power and B-spline estimands are close to the true target functions, whereas the 
more parsimonious linear approximation misses important curvature features of the target 
functions, especially in the elasticity function. 
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Figure 3. 90% Confidence bands for the average quantile elasticity func- 
tion. Pivotal and Gaussian bands are obtained by 1,000 simulations. 
Weighted bootstrap bands are based on 199 bootstrap repetitions with stan- 
dard exponential weights. 
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Figure 4. Estimands of the quantile demand surface. Estimands for the 
linear, power and B-spline series estimators are obtained numerically using 
500,100 simulations. 




Figure 5. Estimands of the quantile elasticity surface. Estimands for the 
linear, power and B-spline series estimators are obtained numerically using 
500,100 simulations. 
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To analyze the properties of the inference methods in finite samples, we draw 500 samples 
from the DGP in equation (15.221) with 3 sample sizes, n: 5, 001, 1, 000, and 500 observations. 
For n = 5, 001 we fix W to the values in the data set, whereas for the smaller sample sizes we 
draw W with replacement from the values in the data set and keep it fixed across samples. 
To speed up computation, we drop the vector V by fixing it at the sample median values 
of the ordinal components and at one for the dummies corresponding to the sample modal 
values for all the individuals. We focus on the average quantile elasticity function 



over the region / = [0.1,0.9]. We estimate this function using linear, power and B-spline 
quantile regression with the same number of terms and other tuning parameters as in the 
empirical example. Although 9{u) does not change with u in our design, again we do not 
impose this restriction on the estimators. For inference, we compare the performance of 
90% confidence bands for the entire elasticity function. These bands are constructed using 
the pivotal, Gaussian and weighted bootstrap methods, all implemented in the same fashion 
as in the empirical example. The interval I is approximated by a finite grid of 91 quantiles 
/ = {0.10, 0.11,..., 0.90}. 

Table 1 reports estimation and inference results averaged across 200 simulations. The 
true value of the elasticity function is 9{u) = —0.74 for all u G I. Bias and RMSE are 
the absolute bias and root mean squared error integrated over /. SE/SD reports the ratios 
of empirical average standard errors to empirical standard deviations. SE/SD uses the 
analytical standard errors from expression (|4.1ip . The bandwidth for J m (u) is chosen using 
the Hall-Sheather option of the quantreg R package (|19|). Length gives the empirical 
average of the length of the confidence band. SE/SD and length are integrated over the 
grid of quantiles /. Cover reports empirical coverage of the confidence bands with nominal 
level of 90%. Stat is the empirical average of the 90% quantile of the maximal t-statistic used 
to construct the bands. Table 1 shows that the linear estimator has higher absolute bias 
than the more flexible power and B-spline estimators, but displays lower rmse, especially 
for small sample sizes. The analytical standard errors provide good approximations to the 
standard deviations of the estimators. The confidence bands have empirical coverage close 
to the nominal level of 90% for all the estimators and sample sizes considered; and weighted 
bootstrap bands tend to have larger average length than the pivotal and Gaussian bands. 



u \-t 9(u) 
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All in all, these results strongly confirm the practical value of the theoretical results and 
methods developed in the paper. They also support the empirical example by verifying that 
our estimation and inference methods work quite nicely in a very similar setting. 

Table 1. Finite Sample Properties of Estimation and Inference Methods 
for Average Quantile Elasticity Function 



Pivotal Gaussian Weighted Bootstrap 















n — 


5, 001 














Bias 


RMSE 


SE/SD 


Cover 


Length 


Stat 


Cover 


Length 


Stat 


Cover 


Length 


Stat 


Linear 


0.05 


0.14 


1.04 


90 


0.77 


2.64 


90 


0.76 


2.64 


87 


0.82 


2.87 


Power 


0.00 


0.15 


1.03 


91 


0.85 


2.65 


91 


0.85 


2.65 


88 


0.91 


2.83 


B-splinc 


0.01 


0.15 


1.02 


90 


0.86 


2.64 


88 


0.86 


2.64 


90 


0.93 


2.84 














n — 


1, 000 












Linear 


0.03 


0.29 


1.09 


92 


1.78 


2.64 


93 


1.78 


2.64 


90 


1.96 


2.99 


Power 


0.03 


0.33 


1.07 


92 


2.01 


2.66 


91 


2.00 


2.65 


95 


2.17 


2.95 


B-splinc 


0.02 


0.35 


1.05 


90 


2.08 


2.65 


90 


2.07 


2.65 


96 


2.21 


2.95 














n - 


= 500 












Linear 


0.04 


0.45 


1.01 


88 


2.60 


2.64 


88 


2.60 


2.64 


90 


2.84 


3.05 


Power 


0.02 


0.52 


1.04 


90 


3.12 


2.65 


90 


3.13 


2.66 


95 


3.29 


3.01 


B-splinc 


0.02 


0.52 


1.04 


90 


3.25 


2.65 


90 


3.25 


2.65 


96 


3.35 


3.00 



Notes: 200 repetitions. Simulation standard error for coverage probability is 2%. 
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Appendix A. Implementation Algorithms 

Throughout this section we assume that we have a random sample {(Yi, Zj) : 1 < i < n}. 
We are interested in approximating the distribution of the process v / n(/3(-) — /?(•)) or of 
the statistics associated with functionals of it. Recall that for each quantile u G U C (0, 1), 
we estimate f3(u) by quantile regression f3(u) = argminggjRm K n [p u (Yi — Z^P)], the Gram 
matrix E m by S m = E n [ZjZ-], and the Jacobian matrix J m (u) by Powell [32] estimator 
Jm{u) = E n [l{|yj — Z[p(u)\ < h n }-ZiZl]/2h n , where we recommend choosing the bandwidth 
h n as in the quantreg R package with the Hall-Sheather option Q19J). 

We begin describing the algorithms to implement the methods to approximate the dis- 
tribution of the process ^/n{P{-) — /?(•)) indexed by U. 

Algorithm 1 (Pivotal method). (1) For b = 1,...,B, draw U b , . . . , U b i.i.d. from U ~ 
Uniform^, I) and compute V b n {u) = n- l / 2 Y!t =1 Zi{u - l{U b < u}), u G U. (2) Ap- 
proximate the distribution of {^/n{|3(u) — P{u)) : u G U} by the empirical distribution of 
{j m l {u)V b n (u) :ueU,l<b<B}. 

Algorithm 2 (Gaussian method). (1) Forb = 1, . . . ,B, generate a m- dimensional standard 
Brownian bridge onU, -B^(-). Define G^(u) = ^m^B^u) foru £U. (2) Approximate the 
distribution of {y / n(/3(w) — P(u)) : u G U} by the empirical distribution of {J^i^G^u) : 
u G U,l < b < B}. 

Algorithm 3 (Weighted bootstrap method). (1) For b = 1, . . . , B, draw h b , . . . , h b n i.i.d. 

from the standard exponential distribution and compute the weighted quantile regression 
process P b {u) = argmin^ g Rm Yli=l hi'PuO^i — Zlft), u £W. (2) Approximate the distribution 
of {y/n(P(u) — P{u)) : u G 14} by the empirical distribution of {y/n(P b (u) — P{u)) : u G hi, 1 < 
b < B}. 

Algorithm 4 (Gradient bootstrap method). (1) For b = 1, . . . , B, draw U b , . . . , U b i.i.d. 
from U ~ Uniform(0, 1) and compute U b n {u) = n' 1 / 2 £"=i Z { {u - \{U\ < u}), ueU. (2) 
For b = 1, . . . , B, estimate the quantile regression process fi b {u) = arg min^gigm Y17=l Pu(Yi — 
Z'iP) + p u {Y n+1 - X b +1 (u)'P), u G U, where X b +1 (u) = -^Jn U b n (u)/u, and Y n+1 = 
nmaxi<j< n \Yi\ to ensure Y Tl+ \ > X b +1 (u)'P b (u), for all u G U. (3) Approximate the 
distribution of {yJn{P{u)— f3{u)) : u G U} by the empirical distribution of{y/n(P (u)—P(u)) : 
u G U, 1 < b< B}. 

The previous algorithms provide approximations to the distribution of ^/n(P(u) — P(u)) 
that are uniformly valid in u G hi. We can use these approximations directly to make 
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inference on linear functionals of Qy\x('\X) including the conditional quantile functions, 
provided the approximation error is small as stated in Theorems [9] and [TTJ Each linear 
functional is represented by {0(u, w) = £(w)'/3(u) + r n (u,w) : (u,w) G /}, where £(w)'/3(u) 
is the series approximation, £(w) G M m is a loading vector, r n (u,w) is the remainder term, 
and I is the set of pairs of quantile indices and covariates values of interest, see Section 
U] for details and examples. Next we provide algorithms to conduct pointwise or uniform 
inference over linear functionals. 

Let B be a pre-specified number of bootstrap or simulation repetitions. 

Algorithm 5 (Pointwise Inference for Linear Functionals). (I) Compute the variance esti- 
matedn(u,w) = u(l— u)£(w)' J^ l 1 (u)T, rn J^ l 1 (u)£(w)/n. (2) Using any of the Algorithms\T^ 
compute vectors Vi(u), . . . , Vb(u) whose empirical distribution approximates the distribution 
of y/n((3(u) — (3(u)). (3) For b = 1, . . . ,B, compute the t-statistic t*^(u,w) 



i(w)'V b (u) 



(4) Form a (1 — a)-confidence interval for 8(u, w) as £(w)'/3(u) ± k n (l — a)a n (u, w), where 
k n (l — a) is the 1 — a sample quantile of {t^(u,w) : 1 < b < B}. 

Algorithm 6 (Uniform Inference for Linear Functionals). (1) Compute the variance es- 
timates a^(u,w) = u(l — u)£(w)' J^ n 1 (u)T lm J^ l 1 (u)£(w)/n for (u,w) G /. (2) Using any 
of the Algorithms compute the processes Vi(-), . . . , Vg(-) whose empirical distribution 
approximates the distribution of {y/n((3{u) — f3(u)) : u G U\. (3) For b = 1, . . . , B, compute 
the maximal t-statistic \\t*^\\i = ^V( u ,w)^l ^(uw) ' ^ Form a (1 — a)-confidence band 
for {9(u,w) : (u,w) G /} as {£(w)' /3(u) ± k n (l — a)a n (u,w) : (u,w) G /}, where k n (l — a) 
is the 1 — a sample quantile of {\\t^\\i : 1 < b < B}. 



Appendix B. A result on identification of QR series approximation in 

POPULATION AND ITS RELATION TO THE BEST L 2 - APPROXIMATION 

In this section we provide results on identification and approximation properties for the 
QR series approximation. In what follows, we denote z = Z(x) G Z for some x G X, and 
for a function h : X — > ~R we define 

Q u (h) = E[ Pu (Y - h(X))}, 

so that P(u) G argminggRm Q u (Z'f3). Also let / := mf ueUjX ex fY\x(QY\x(u\X)\X), where 
/ > by condition S.2. 



:>,:>, 



Consider the best L 2 -approximation to the conditional quantile function g u {-) = Qy\x(u\-) 
by a linear combination of the chosen basis, namely 



P\u)e a rgmmE[\Z>f3-g u (X)\ 2 ]. 
We consider the following approximation rates associated with j3 (u): 



(B.23) 



-u,2 



E 



\Z'p\u)-g u (X)(< 



and c u>00 = sup \z'(3 (u) - g u {x)\. 

x£X,z=Z(x) 



Lemma 1. Assume that conditions S.2-S-4, CmC u ,2 = o(l) and c Uj00 = o(l) hold. Then, as 
n grows, we have the following approximation properties for (3(u): 

E [\Z'(3(u) - g u (X)\ 2 ] < (16 V 3///)< 2 , E \\Z'f3(u) - Z'(3*(u)\ 2 } < (9 V 8///)c£ )2 and 



sup \z'P(u) - g u (x)\ < c Uj00 + ( m J (9 V 8///)c Ui2 / \J mineig(£ m ). 

i6Af,2=Z(a;) v 



1/2 



> 3£ 



Proof of LemmaUl We assume that E \Z'f3(u) — Z $ (u)\ 2 
otherwise the statements follow directly. The proof proceeds in 4 steps. 

Step 1 (Main argument). For notational convenience let 

q=(f /2 /f')E [\Z'P(u)-g u (X)\ 2 f 2 /E [\Z'(3(u) - ^(X)| 3 
By Steps 2 and 3 below we have respectively 

Q u {Z'P(u)) - Q u {g u ) < fc 2 2 and 



Z'0*(u)-g u (X)\< 



1/2 



Q u {Z'p(u)) - Q u {g u ) > 



(B.24) 
(B.25) 



Thus 



As n grows, since 

\ f°l,2 < by Step 4 below, it follows that E [\Z'(3(u) - g u (X)\ 2 ] < 
3c^ 2 i ' I ' f which proves the first statement regarding /3(u). 
The second statement regarding (3(u) follows since 



E \Z>P{u)-Z'p*{u)\* < y/E[\Z'P(u)-g u (X)\*] + JE \Z'p' (u) - g u (X)\* < j3< 2 ///+c», 
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Finally, the third statement follows by the triangle inequality, S.4, and the second state- 
ment 

su PxeX,z=z(x) WP(u)-gu(x)\ < sup xeX z=z{x) \z'j3*{u) - g u (x)\ + ( m \\p(u) - 0*(u)|| 



— Cu,oo ~i~ Qm\ E 



\Z'f3*(u) - Z'(3(u)\ 2 /mineig(£ 



< C Uj00 + (myj(9 V 8///)c Ui 2/\/ mmei g( S m)- 

Step 2 (Upper Bound). For any two scalars w and v we have that 

p u (w - v) - p u {w) = -v(u - \{w < 0}) + / (l{w <t}- l{w < 0})dt. 

Jo 



(B.26) 



(B.27) 



By (|B.26|) and the law of iterated expectations, for any measurable function h 

QM - Q u (9u) = E [j^ F Ylx (g u + t\X) - F Ylx (g u \X)dt 
= E [f^ 9 " tf Ylx (g u + t x ,t\X)dt\ < (f/2)E[\h - g u \ 2 } 

where t X j lies between and t for each t G [0, h(x) — g u (x)]. 

Thus, (|B371) with h(X) = Z'(3*(u) and Q u (Z'(3(u)) < Q u (Z'(3*(u)) imply that 

Q u {Z'p(u)) - Q u {g u ) < Q u {Z'p'{u)) - Q u (g u ) < /< 2 . 
Step 3 (Lower Bound) . To establish a lower bound note that for a measurable function h 



Qu(h) - Q u (g u ) = E [/* 9u F Ylx (g u + t\X) - F Y \ x {g u \X)dt 



E 



Jo 9u tf Ylx (g u \X) + 'ify^igu + i x ,t\X)dt 



>(f/2)E[\h-g u \ 2 }-y-'E[\h-g u n 



(B.28) 



If \Jf_E [\Z'P(u) - g u (X)\ 2 ] < q, then f'E [\Z' f3(u) - g u (X)\ 3 } < f_E [\Z' f3(u) - g u {X)\ 2 } 
and (|B~28|) with the function h(Z) = Z'f3(u) yields 

Q U {Z P(u)) - Q u {g u ) > . 



On the other hand, if JfE [\Z' f3{u) - g u (X)\ 2 } > q, let h u (X) = (1 - a)Z'/3(u) + ag u (X) 



where a £ (0, 1) is picked so that w fE [\h u — g u \ 2 ] = Q- Then by (|B.28P and convexity of 
Q u we have 

^f_E[\Z'P(u)-g u {X)\> 



h(Z'p(u))-Q u (g u )> 



( Qu{K) - Qu(g u ) ) 
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Next note that h u (X) - g u (X) = (1 - a)(Z' (3{u) - g u (X)), thus 

f FUh _ , 21 _- 7 3/2 E[\Z'(3(u) - g u (X)\^ f' 2 E[\h u - g u \^ 2 
l*.[\*u 9u\\-q - j, E[ \ z ,p {u) _ gu(xW] ~ J, E[\h u -g u \*] ' 

Using that and applying ()B.28p with h u we obtain 

Qu{K) - Qu{9u) > {f/2)E[\h u - g u \ 2 ] - lf'E[\h u - g u f] = q 2 /3. 



6' 



Therefore 

Q u (Z'p{u)) - Q u (g u ) > 



— 3 ~ {^yL E W z £(«) - 9u{X)Y 



Step 4. (/c 2 2 < 9 2 /3 as n grows) Recall that by S.4 \\Z\\ < Cm and that we can assume 



E 



\Z>P(u)-Z>P*(u)f 



1/2 



> 3E 



\Z'P*(u)-g u (X)\' 



1/2 



Then, using the relation above (in the second inequality) 



"|3/2 



E[\Z'/3(u)-g u (X)\' 
E[\Z>P(u)-g u {X)\*\ 



> 



> 



> 



E[\Z'p(u)-g u (X)\2} 1/2 
s ^Pxex,z=z{x) \z'P(u)-g u (x) 



1 



E\\Z'p(uyZ> p~ (u)\ 2 \ 1/2 +e\\Z>P* (u)~g u (X)\^ y ' 



: sup x£XiZ ^ z(z) \z'P(u)-z'P*(u)\+Bxip xex>z=z(x) \z'$*(u)-g u {x) 



1 



E\\Z'p{u)-Z'(j (u) 



1/2 



A 



E\\Z'P (u)-g u (X)\- 



1/2 



2 1 suPxex,z=Z(x) WP(u)-z>/3" (u)\ sup xeX z=z(x) \z'P"(u)-g u (x)\ 
2 \{ m \\p(u)-p*(u)\\ cu, 



where k = \J mineig(S m ). 

Finally, yffc^ < (/ 3/2 /7 7 )(V 2 )(^ A ^)/v / 3 < <?/V3 as n grows under the condition 
Cm c M,2 = o(l), c Uj00 = o(l), and conditions S.2 and S.3. □ 



Appendix C. Proof of Theorems [TH7] 

In this section we gather the proofs of the theorems [TUT] stated in the main text. We adopt 
the standard notation of the empirical process literature [37]. We begin by assuming that 
the sequences m and n satisfy m/n->0asra,u->oo. For notational convenience we write 
^i(/3,u) = Zi(l{Yi < Z[f3} — u), where Zj = Z(Xi). Also for any sequence r = r n = o(l) 
and any fixed < B < oo, we define the set 

Rn,m ■= {(u,P) GMxR m : ||/3 - P(u)\\ < Br} 
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and the following error terms: 

e (m,n) := sup \\G n (ipi(/3(u),u))\\, 

ei(m,n) := sup \\G n (Tpi(0,u)) - G n (i>i(P(u),u))\\, 

{u,/3)eRn,m 

e 2 (m,n) := sup n^WE^tf, u)\ - E[^(P(u), u)) - J m (u)(fi - P(u))\\. 

In what follows, we say that the data are in general position if for any 7 £ M m , 
P (Yi = Z 4 '7,for at least one i) = 0. Under the bounded density condition S.2, the data 
are in the general position. We also assume i.i.d. sampling throughout the appendix, 
although this condition is not needed for most of the results. 

C.l. Proof of Theorem [H We start by establishing uniform rates of convergence. The 
following technical lemma will be used in the proof of Theorem [TJ 

Lemma 2 (Rates in Euclidian Norm for Perturbed QR Process). Suppose that /3(u) is a 
minimizer of 

E n [p u (Yi-Zll3)]+A n {uyp 

for each u £ IA, and that the perturbation term obeys sup ugW ||^4 n (w)|| <p r = o(l). The 
unperturbed case corresponds to A n {-) = 0. If inf u& k mineig [J m (u)] > J > and the 
conditions 

Rl. e (m,n) <p y/nr, 
R2. ei(m,n) < P ^/nr, 
R3. e 2 (m,n) < P ^/nr, 

hold, where the constants in the bounds above can be taken to be independent of the constant 
B in the definition of R m ^n, then for any e > 0, there is a sufficiently large B such that 
with probability at least 1 — e, (u,0(u)) G R m ,n uniformly over u £ U, that is, 



sup 



P{u)-P{n) < P r. (C.29) 



Proof of Lemma\^ Due to the convexity of the objective function, it suffices to show that 
for any e > 0, there exists B < 00 such that 

P ( inf inf rf [E n (/?, «)] + A n {u)\ \p =m+B r V > o") > 1 - e. (C.30) 

\u&A ||r/|| = l ' / 
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Indeed, the quantity E n [tpi (/3,it)] + A n {u) is a subgradient of the objective function at f3. 
Observe that uniformly in u S U, 

y/nr]'E n [ipi (f3(u) + Brr),u)] > G n (r/'ipi (/3(u), u)) + rj ' J m {u)r)B\/nr - ei(m,n) - e 2 (m,n), 

since E [ipi(f3(u),u)] = by definition of (3(u) (see argument in the proof of Lemma [3} . 
Invoking R2 and R3, 

ei(m, n) + e 2 (m, n) < P ^/nr, 
and by Rl, uniformly in r/ G S m ~ 1 we have 

I Gniv'tpiiP ( u )i u )) I < sup ||G n (^(/3(u),u))|| = e (m,n) < P y^r. 

new 

Then the event of interest in (|C.30P is implied by the event 

r\ J m {u)r\BsJnr - e (m,n) - ei(m,n) - e 2 (m, n) - -y/nsup ||Ai(w)|| > > , 

ueu J 

whose probability can be made arbitrarily close to 1, for large n, by setting B sufficiently 
large since sup^g^ ||^4 n (u)|| <p r, and rf J m (u)r] > J > by the condition on the eigenvalues 
of J m (u). □ 

Proof of TheoremUl Let 4> n = sup ae5 m-i £[(Z» 2 ] V E n [(Z» 2 ]. Under (? n log n = o(n) 
and S.3, (/>„ <p 1 by Corollary [2] in Appendix [Gl 

Next recall that R m , n = {(it,/?) £ U x R m : ||/3 - /?(u)|| < £r} for some fixed B large 
enough. 

Under S.1-S.5, eo(m,n) <p \J m(j) n log n by Lemma l23l and ei(m,n) <p \/m(j)^\ogn 
by Lemma [22l where none of these bounds depend on i?. Under S.1-S.5, £2(771, n) < 
y/nC im B' 2 r 2 + \fnm~ rL Br < -y/nr by Lemma l2~il provided ( m B' 2 r = o(l) and m~ K B = o(l). 

Finally, since C^mlogn = o(n) by the growth condition of the theorem, we can take 
r = y (mlogn)/n in Lemma [2] with A n {u) = and the result follows. □ 

C.2. Proof of Theorems [2~1 |4l [5] and [6l In order to establish our uniform linear 
approximations for the perturbed QR process, it is convenient to define the following ap- 
proximation error: 

e 3 (m,n) := sup rz 1 / 2 1| K n [fy(j3(u),u)] +A n (u)\\. 

u&A 

Lemma 3 (Uniform Linear Approximation). Suppose that (3(u) is a minimizer of 



:>>N 



for each u G U, the data are in general position, and the conditions of Lemma hold. The 
unperturbed case corresponds to A n {-) = 0. Then 



1 

VnJ m (u) (j3{u) - P{uy) = —j= ^2 i>i{P( u )> u ) ~ -A n (u) + r n (u), 

i=l 

su P«e« IK(^)II ei(m,n) + e 2 (m,n) + e 3 (m,n). 



(C.31) 



Proof of Lemma® First note that E [ipi(/3(u), u)] = by definition of f3(u). Indeed, despite 
the possible approximation error, (3(u) minimizes E[p u (Y — Z' f3)\ which yields E [ipi {(5 (it) , u)] - 
by the first order conditions. 

Therefore equation (|C.3ip can be recast as 

r„(u) = n 1 ' 2 J m (u)0{u) - P{u)) + G n (V; (/?(u), «)) + A n {u). 

Next note that if (u,/3(u)) G R n ,m uniformly in u G IA by the triangle inequality, uniformly 
in u € U 



r n {u)\\ < G n (M(3(u),u))-G n (MP(u),u)) 



+ 



+ n 



1/2 



+ n 



1/2 



if>Mu),u)\ - E [xP0{u),u)\ - J m (u) ( /3(u) - (3(u) 
ipi(^{u),u) +A n (u 



+ 



< ei(m, n) + e2(m, n) + €3(771, m). 

The result follows by Lemmad which bounds from below the probability of (u, (3(u)) G R n ,m 
uniformly in u G U. 

□ 

Proof of Theorem® We first control the approximation errors eo, ei, £2; an d €3. Lemma 



I implies that eo(m,n) <p ^Jm§ n log n. Lemma l24l implies that 

ei(m, n) < P a/ mC m r log n + m< * m k* g n anc [ £2(777, n) < P \/n( m r 2 + ^fnm~ K r. 



n 



Next note that under the bounded density condition in S.2, the data are in general 
position. Therefore, by Lemma [26] with probability 1 we have 



£3(777,77) < 



Thus, the assumptions required by Lemma follow under m 3 (^ log n = o(n), the uni- 
form rate r = W (777 log n)/n, and the condition on k. 
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Thus by Lemma [3] with A n (-) = 

— 1 n 

y/nJ m (u) (p(u) - P{u)\ = -=^2ipi(/3(u),u) +r n (u), where 



n 

i=l 



ii i mi <- r~r i , m (mlogn _ K , — mCm 

sup r n (u) <p V mC,mT log n H = h m y m log n H — . 

new V" V n 



Next, let 

1 n 

f u := — V Z,(l{y, < ZiP(u) + R(Xi,u)} - l{Yi < Z[P{u)}). 
By Lemma [25] sup u6W \\r u \\ <p y / m 1_K logn + m ^'^' g ~ ■ The result follows since U„(u) =^ 

Proof of Theorem [3] Note that 

sup \\r n (u)\\ < sup \a'(J^(u) - J~ l {u))a\ sup ||U* {u)\\. 

To bound the second term, by Lemma [23] 

sup||U;(V)|| <P Vmlogn (C.32) 

ueu 

since <p n <p 1 by Corollary [2] in Appendix iGl under (^logn = o(n). 

Next since J m («) has eigenvalues bounded away from zero by S.2 and S.3, J~ 1 (w) also 
has eigenvalues bounded above by a constant. Moreover, by Lemma [J] 

sup \a'(J m (u) - J m (u))a\ < P e 5 (m, n) + e 6 (m, n) 

aGS m ~ 1 ,ueU 



where the second inequality follows by Lemma[28]with r = \J (m log n)/n. Under the growth 
conditions stated in the Theorem, 



sup \a'(J m (u) - J m {u))a\ < o P ( 1/J m log 3 n ) . 



Thus, with probability approaching one, J m 1 (n) has eigenvalues bounded above by a con- 
stant uniformly in u 6W. 
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Moreover, by the matrix identity A^ 1 — B^ 1 = B^ 1 (B — A)A~ 1 

la'iJ' 1 ^) - J~ 1 (u))a| = \a'J m l (u)(J m (u) - J m (u))J~ l (u)a\ 

< || a' J" 1 (u) || sup \a (J m {u) - J m (u))a\ \\ J^{u)a\\ (c 331 



< o P [l/yJm\og 6 n } 

which holds uniformly over a E S m ~ 1 and u £W. 

Finally, the stated bound on sup MgW ||r n (?z)|| follows by combining the relations (|C.32p 
and (^331) . □ 

Proof of Theorem The first part of the proof is similar to the proof of Theorem [2] but 
applying Lemma[3]twice, one to the unperturbed problem and one to the perturbed problem 
with A n {u) = — U* {u)/y/n, for every u G U. Consider the set E of realizations of the data 
T> n such that sup|| a || =1 E n [(c/Zj) 2 ] < 1. By Corollary [2] in Appendix [Gland assumptions S.3 
and S.4, P(E) = 1 — o(l) under our growth conditions. Thus, by Lemma [23] 



ii a i mi ^ /mlogra 

sup AiW <pr=\ 

u&A V n 

where we used that <fi n <p 1 by Corollary [2] in Appendix iGl under (^logn = o(n) and S.3. 

Then, 

V^Jm(u) (>(«) - = V^Jm(u) (p*(u) - - (fi(u) ~ £(«)) 

= U» + < ert (n) - rr per *(^), where 
sup \K ert (u) - r% n P ert (u)\\ < P ^/m( m r log n + mCml ° gU + m"*^^ + 5 " t " i 



Note also that the results continue to hold in P-probability if we replace P by P* , since 
if a random variable B n = Op(l) then i? n = Op*(l). Indeed, the first relation means that 
P(\B n \ > i n ) = o(l) for any t n — > oo, while the second means that P*(\B n \ > £ n ) = op(l) 
for any £ n — > oo. But the second clearly follows from the first from the Markov inequality, 
observing that E[P*(\B n \ > £ n )} = P(\B n \ > t n ) = o(l). □ 

Proof of Theorem The existence of the Gaussian process with the specified covariance 
structure is trivial. To establish the additional coupling, note that under S.1-S.4 and the 
growth conditions, supii a |i = i E n [(a'Zi) 2 ] <p 1 by Corollary [2] in Appendix [Gl Conditioning 
on any realization of Z\,...,Z n such that sup|i a ii =1 E n [(c/Zj) 2 ] < 1, the existence of the 
Gaussian process with the specified covariance structure that also satisfies the coupling 
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condition follows from Lemma Under the conditions of the theorem sup ugW ||U n (n) — 
G n («)|| = op(l/logn). Therefore, the second statement follows since 

sup \\VE0(u) - P(u)) - J m \u)G n {u)\\ 
ueu 

< sup || J" 1 (u)V n (u) - J m l {u)G n {u)\\ + o P (l/logn) 
new 

< sup || J" 1 (u) || sup ||U n (u) - G n (u)\\ + o P (l/\ogn) = o P (l/logra), 
u&u ueu 

where we invoke Theorem [2] under the growth conditions, and that the eigenvalues of J m (u) 
are bounded away from zero uniformly in n by S.2 and S.3. 

The last statement proceeds similarly to the proof of Theorem [3l 

□ 

Proof of Theorem Note that /3 (u) solves the quantile regression problem for the rescaled 
data {(hiYi, h{Zi) : 1 < i < n}. The weight hi is independent of (Yi,Zi), E[hi] = 1, 
E[hj) = 1, and maxi<j< n hi log 7i. That allows us to extend all results from /3(h) to f3 (it) 
replacing Cm by Cm = Cm log n to account for the larger envelope, and -0^ (/3, u) = hitpi(f3, u). 

The first part of the proof is similar to the proof of Theorem [2] but applying Lemma [3] 
twice, one to the original problem and one to the problem weighted by {hi}. Then 

n (>(u) - P(u)\ = (P\u) - P(uj) + (p(u) - P(uj\ 
j— it \ n 



£ - < 4%)}) - r „(«) 



where sup ugW ||r n («) - r£(u)|| <p \fmC, m r log 2 n + " + m- K yJm log n + mCT ^ g n ■ 

Next, let 

1 " 

r M := -= ^(hi - l)Zi(l{Yi < ZlP(u) + R{Xi,u)} - l{Y t < Z'^u)}). 

By Lemma l25l sup ugW ||r u || <p \J m l ~ K log n + ^"'"^ g n . The result follows since l{Ui < 
u} = l{Y i <Z' i p(u) + R(X i ,u)}. 

Note also that the results continue to hold in P-probability if we replace P by P* , since 
if a random variable B n = Op(l) then B n = Op*(l). Indeed, the first relation means that 
P(\B n \ > In) = o(l) for any l n — > oo, while the second means that P*(\B n \ > £ n ) = op(l) 
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for any £ n — > oo. But the second clearly follows from the first from the Markov inequality, 
observing that E[P*(\B n \ > i n )} = P(\B n \ > £ n ) = o(l). 

The second part of the theorem follows by applying Lemma [8] with v% = hi — 1, where 
hi ~ exp(l) so that E[vf] = 1, E'dujl 3 ] < 1 and maxi<j< n \vi\ <p logn. Lemma [8] implies 
that there is a Gaussian process G n {-) with covariance structure E n [ZjZ 4 '](-u An' — uv!) such 
that 



sup 



1 n 

-=y)(/H-l)Zt(u-l{l7i 



< u}) - G n (u) 



< P o(l/ log n). 



Combining the result above with the first part of the theorem, the second part follows by 
the triangle inequality. □ 

C.3. Proof of Theorem [7J on Matrices Estimation. Consider the following quantities 



€4(111, n) 
e 5 (m,n) 

e 6 (m,n) 



sup 

-1/2 



n 



sup I G n {l{\Yi - ZlP\ < K^a'Zif) 



sup 



1 



— E [l{\Yi - Zip] < h n }(a'Zi) 2 ] - a'J m {u)a 



where h n is a bandwidth parameter such that h n — > 0. 

Lemma 4 (Estimation of Variance Matrices and Jacobians). Under the definitions of €4, €5, 
and €q we have 



sup 



a 7 (Em - S m )a 



< £4(771, n) sup 



a'(J m (u) - J m (u))a < e 5 (m, ra)+e 6 (m, n 



Proof. The first relation follows by the definition of 64(171, n) and the second by the triangle 
inequality and definition of £5(777, n) and e%(m,n). □ 



Proof of Theorem [?J This follows immediately from Lemma HI Lemma [271 and Lemma [28j 

□ 



Appendix D. Proofs of Theorems M and [9] 
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Proof of Theorem^ By Theorem [2] and PI 

\6(u,w) - 6(u,w)\ < \t(u)'0(u) - p(u))\ + |r„(«,«;)| 

< +OP (fgL) +o( pHi|/Vi) 



< ^ ^™ + op [gj + (&(m, to)/0i) 

where the last inequality follows by < £g(m,w) assumed in P2. Finally, since 

E[\£(w)'J-\u)V n (v,)\ 2 ] < ||£(«;)|| 2 ||J r - 1 (u)|| 2 sup aeS ™-i J B[(a'^) 2 ] < e e (m,w), the result 
follows by applying the Chebyshev inequality to establish that \£{w) J~ 1 (u)U n (ti)| <p 
£ e (m,w). □ 

Proof of Theorem^ By Assumption PI and Theorem [2] 

U U ,W) = ^P^m + = iM'J^MV (u) / P( W )|| \ / ||£W|| 

To show that the last two terms are op(l), note that under conditions S.2 and S.3, & n (u, w) >p 
||^(u')||/y / n since a n (u, w) = (1 + op(l))a n (u, w) by Theorem [71 

Finally, for any e > 0, E[(£(w)' J m x {u)Zi) 2 j a 2 n {u, w)l{\£(w)' 3 m x (u)Zi\ > e^na n (u,w)}] -> 
under our growth conditions. Thus, by Lindeb erg- Feller central limit theorem and 
a n (u,w) = (1 + op(l))<j n (u,w), it follows that 

y/na n (u,w) 

and the first result follows. 



The remaining results for i*(u, u>) follow similarly under the corresponding conditions. 

□ 



Appendix E. Proofs of Theorems [1TJHT21 

Lemma 5 (Entropy Bound). Let W C M. d be a bounded set, and I : W — > M. m be a mapping 
in W, such that for ^(m) and ^g{m) > 1, 

\\l(w)\\ < Z, e (m) and \\£(w) - £(w)\\ < £%(m)\\w - w\\, for all w , w G W . 

Moreover, assume that \\J m (u) — J m (u')\\ < £(m)\u — u'\ foru,u' in the operator norm, 
£(m) > 1, and let \i = sup ugW ||J m (/u) _1 1| > 0. Let C m be the class of functions 

C m = {fu,M = £{w)'J m l (u)g : ueU, weW}U {f(g) = 1} 



where g E B(0,(m), and let F denote the envelope of C m , i-e. F = supy g £ m |/|. Then, for 
any e < eq, the uniform entropy of C m is bounded by 

at( iiz.il r t <nw s ^o + diam(W x W)K\ d+1 
supiV(>||F||Q ; 2,£ m ,L2(Q)) < ^ ^ —j 

where K := fJ,£g( m + /u%(m)£(m)C m . 

Proof. The uniform entropy bound is based on the proof of Theorem 2 of Andrews [lj. We 
first exclude the function constant equal to 1 and compute the Lipschitz constant associated 
with the other functions in C m as follows 

\l{w)'J-\u)g - l{w)'J m \u)g\ = \(£(w) - l(w))> J-\u)g + l{w)'{J m \u) - J- l {u))g\ 

= \{l{w)-l{w))'J-\u)g\+ 

+\£(w)' ' J^(u)(J m (u) - J m (u))Jm(u)g\ 

< ${rn)\\w- w\\\\ J m 1 (u)g\\ + 

+ \\£{w)'J m 1 (u)\\ t(m)\u-u\ \\J- l {u)g\\ 

< K (\\w — w\\ + \u — u\) 

where K := fJ^JfCm + M%(m)£(™<)Cm- 

Consider functions fa^wj, where (uj, uJj)'s are points at the centers of disjoint cubes of 
diameter e||.F||q whose union covers U x W. Thus, for any (u,w) € x W 

min \\f uw - fu.wA\Q,2 < K min(||u; - Wj\\ + \Hj - u\) < e||F||Q, 2 . 

Uj,Wj V-j,Wj 

Adding (if necessary) the constant function in to the cover, for any measure Q we obtain 

N(e\\F\\ Q>2 ,£ m ,L 2 (Q)) < 1 + (diam(W x W)K/e\\F\\ Q>2 ) d+1 . 

The result follows by noting that e < so and that ||-F||q,2 > 1 since C m contains the function 
constant equal to 1. □ 

Lemma 6. Let J- and Q be two classes of functions with envelopes F and G respectively. 
Then the entropy of FQ = {fg : / G F, g 6 Q} satisfies 

sup N(s\\FG\\ Q}2 ,Tg,L 2 (Q)) < supN((e/2)\\F\\ Q>2 ,F,L 2 (Q)) sup N{{e/2)\\G\\q^, G, L 2 (Q)). 
Q Q Q 

Proof. The proof is similar to Theorem 3 of Andrews pQ. For any measure Q we denote 
cf = \\F\\q 2 an d cg = \\G\\q 2 - Note for any measurable set A, Qp{A) = J A F 2 (x)dQ{x) / cf 
and Qg(A) = J A G 2 (x)dQ(x)/cG are also measures. Let 

K = sup N(e\\G\\ QF:2 ,g,L 2 (Q F )) = sup N(e\\G\\ Q>2 ,g,L 2 (Q)) and 
Qf q 
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L = supiV(e||F||Q G , 2 , F, L 2 (Q G )) = sup N(e\\G\\ Q , 2 ,g, L 2 (Q)). 
Qg Q 

Let g\ , . . . , gx and fi, ■ ■ ■ , /l denote functions in Q and T used to build a cover of Q and 
T of cubes with diameter eUFGHg^. Since F > \ f\ and G > \g\ we have 

min \\fg- ft9kh,2 < min 11/(5 - 9k)\\Q,2 + bkif ~ fi)\\Q,2 

£<L,k<K t<L,k<K 

< min \\F(g - g k )\\ Q;2 + mm \\G(f - f e )\\ Qt2 

k<K £<L 

= min 11(9 - 9k)h F ,2 + mm ||(/ - fi)\\ QG}2 

hi i\ ii < ^~. Li 

< e ||G||Q F!2 + £ ||F|| Qoi2 = 2 £ ||FG|| Qj2 . 

Therefore, by taking pairwise products of gi, ... ,gx and /i, • . . ,/x to create a net we 
have 

N(e\\FG\\ Q , 2 ,Tg,L 2 (Q)) < N((e/2)\\F\\ Q>2 ,F,L 2 (Q)) N((e/2)\\G\\ Q>2 ,g,L 2 (Q)). 
The result follows by taking the supremum over Q on both sides. □ 

Proof of Theorem \1(A By the triangle inequality 

sup \6(u,w) — 9(u,w)\ < sup \£{w)' (f5{u) — P(u))\ + sup \r n (u,w)\ 

(u,w)£l (u,w)dl (u,w)£l 

where the second term satisfies sup^ u w ^ &1 \ r n (u, w)/\\£(w)\\ \ = o(n -1 / 2 log -1 n) by condi- 
tion U.l. 

By Theorem [21 the first term is bounded uniformly over / by 

\l(w)'@(u)-f}(u))\ <p ^Hj^WN +0p ( 6(m>/ )/[ v ^i ogn ]) (E.34) 

\ n 



since ||£(u>)|| < £g(m,I) by U.2 and the remainder term of the linear representation in 
Theorem [2] satisfies sup ugW ||r n (u)|| = op ( 1/ log n). 

Given Z{ = Z{Xi) consider the classes of functions 
?m = {£(u}yj-\u)Zi : (u,w) el}, 9 = {(l{Ui < u}-u) : ueU} and C m = F m U{f(Zi) 

By Lemma [5] and Lemma [6] the entropy of the class of function C m g = {fg : f € C m , g £ 
g} satisfies, for K < (&(m,I) V l){${mj) V 1)(C™ V 1), 

suplogiV(e||F|| 2i Q,£ m g,L 2 (Q)) < (d + l)log[(l + diam(7)if)/ £ ] < log(n/e) 
Q 

by our assumptions. Noting that under S.3 



sup \Ju{l - u)£(w)'J m 1 (u)'E m J m 1 (u)£(w) < £e(m,I), 

(u,w)£l 
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Lemma [TBI applied to C m Q with J(m) = ^/dlogn, M[m,n) < 1 V ^(m, I)( m yields 

\t( W )'J m (u)-^ n {u)\ < P ( 1 V tfim, I) + dl ° S(n) (lV^(m,/)^)log» \ 1/2 bgl/2 ; 



sup 

(u.w)ei \ n 

where the second term in the parenthesis is negligible compared to the first under our 
growth conditions. The result follows using this bound into (|E.34p . 

□ 

Proof of Theorem\T]\ Under the conditions of Theorems [2] and El sup ugW |maxeig( J~ 1 (n) — 
J~ 1 (u))| <p o{\/y/m log 3//2 n), and a n (u,w) = (1 + op(l/^/mlog 3 ^ 2 n))a n (u, w) uniformly 
in (u,w) G /. Note that a n (u,w) >p \\£(w)\\/y/n. Next define i* n (u, w) as follows: 

£{w)'J m \u)W n {u)/^i 



For pivotal and gradient bootstrap coupling: t* n {u,w) 
For Gaussian and weighted bootstrap coupling: t* n {u,w) 



<T n (u,w) 
tiwyj^i^Gniu)/^ 



a n (u,w) 

Note that t* n is independent of the data conditional on the regressor sequence Z n = (Z%, Z n ), 
unlike t* which has some dependence on the data through various estimated quantities. 

For the case of pivotal and gradient bootstrap couplings by Theorem [2] and condition U 

t n (u,w) = d t* n (u,w) + op(l/logn) in £°°(I). 

Moreover, for the case of Gaussian and weighted bootstrap couplings under the conditions 
of Theorem [5] 

t n (u,n) = d i* n (u,w) + op (1/ log n) m£°°(I). 
Finally, under the growth conditions 

t* n {u,w) = t* n (u,w) + o P {l/logn) in £°°(I). 

Thus, it follows that uniformly in (u, w) £ I 

t n (u, w) = d t* n (u, w) + op(l/ log n) = t n (u, w) + o P (l/ log n) 

and the result follows. 

□ 

Proof of Theorem [TB. Let e n = 1/logn, and S n such that <5 n log 1//2 n — > 0, and 5 n /e n — > oo. 
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Step 1. By the proof of Theorem [TT1 there is an approximation = sup^ uw ^j \t n (u, w)\ 
t° \\t*n\\i = su P(n,to)e-T \Kii u i w )\-> which does not depend on the data conditional on the re- 
gressor sequence Z n = [Z\, Z n ), such that 

^(1 K\\i-\K\\ I \<e n ) = l-o(l). 

Now let 

k n (l — a) := (1 — a) — quantile of conditional on T> n , 

and let 

K n {l — a) := (1 — a) — quantile of conditional on T> n . 

Note that since \\i n \\j is conditionally independent of Y\, . . . , Y n , 

K n (l — a) = (1 — a) — quantile of conditional on Z n . 

Then applying Lemma [7J to ||i* \\j and we get that for some u n \ 

P[n n (p) > k n (p - v n ) - e n and k n (p) > n n (p - v n ) - e n ] = 1 - o(l). 

Step 2. Claim (1) now follows by noting that 

P{\\t n \\i > k n (l - a) + 6 n } < P{\\tn\\i >K n (l- a -v n )- e n + 5 n } + o(l) 

< P{||C||/ > K n (l - a - v n ) - 2e n + 5 n } + o(l) 

< P{\\fji > K(l -a- 2v n ) - 3e n + 5 n } + o(l) 

< P{\\t* n \\i>k n (l-a-2u n )} + o(l) 

= E P [P{\\t*Ji > kn(l -a- 2v n )\V n }} + o(l) 

< E P [a + 2v n ] +o(l) = a + o{l). 

Claim (2) follows from the equivalence of the event {8(u,w) £ [L(u,w),'C(u,w)], for all 
(u,w) € 1} and the event {||t n ||/ < c n (l — a)}. 

To prove Claim (3) note that a n (u,w) = (1 + op(l))a n (u,w) uniformly in (u,w) £ I 
under the conditions of Theorems [2] and [7J Moreover, c n (l — a) = k n (l — a)(l + op(l)) 
because l/k n (l — a) <p 1 and 5 n — > 0. Combining these relations the result follows. 

Claim (4) follows from Claim (1) and from the following lower bound. 

By Lemma [7J we get that for some v n \ 

P[nn(p + v n ) +£n> k n {p) and k n (p + v n ) + e n > K n (p)\ = 1 - o(l). 
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Then 

P{\\t n \\i>k n O--a) + S n } > P{\\t n \\i>K n (l-a + v n ) + e n + 6 n }-o(l) 

> P{\K\\i > Kn(l -a + v n ) + 2e n + 5 n ] - o(l) 

> P{\K\\i > K(l -a + 2u n ) + 3e n + 5 n } - o(l) 

> P{\\t* n \\l > Ml " « + 2 ^n) + 2 <U " "(I) 

> E[P{\\t* n \\i > Ml - « + 2 ^n) + 25 n |P n }] - 0(1) 
= a — 2^ n — o(l) = a — o(l), 

where we used the anti-concentration property in the last step. 

We use this lemma in the proof of Theorem [12j 



□ 



Lemma 7 (Closeness in Probability Implies Closeness of Conditional Quantiles). Let X n 
and Y n be random variables and T> n be a random vector. Let Fx n (x\T> n ) and i 7 V n (x|2? n ) 
denote the conditional distribution functions, and F^ 1 (p\T> n ) and Fy (p\T> n ) denote the 
corresponding conditional quantile functions. If \X n — Y n \ = op{e), then for some v n \ 
with probability converging to one 

F x l(p\V n ) < F~^{p + v n \V n ) + e and F Y ^{p\V n ) < F^(p + u n \V n ) + e, Vp G (u n , 1 - v n ). 

Proof. We have that for some v n \ 0, P{|X n — Y n \ > e} = o{v n ). This implies that 
P[P{\X n — Y n \ > e\V n } < i/ n ] — > 1, i.e. there is a set Q n such that P(Q n ) — > 1 and 
P{\X n - Y n \ > e\V n ] < v n for all V n G Q n . So, for all V n G Q. n 

F Xn {x\V n ) > F Yn +e{x\V n ) - v n and F Yn (x\V n ) > F Xn+E (x\V n ) - v n ,Mx G R, 

which implies the inequality stated in the lemma, by definition of the conditional quantile 
function and equivariance of quantiles to location shifts. □ 

Appendix F. A Lemma on Strong Approximation of an Empirical Process of 

an Increasing Dimension by a Gaussian Process 

Lemma 8. (Approximation of a Sequence of Empirical Processes of Increasing Dimension 
by a Sequence of Gaussian Processes) Consider the empirical process U n in [£°°(U)] m , U C 
(0, 1), conditional on Zi G W" 1 , i = 1, . . . , n, defined by 

U n (n) = G n (viZi%l)i(u)) , ipi(u) = u - l{Ui < u}, 
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where Ui,i = l,...,n, is an i.i.d. sequence of standard uniform random variables, Vi,i = 
1, . . . ,n, is an i.i.d. sequence of real valued random variables such that E[vf] = 1, E[\vi\ 3 } < 
1, and maxi<j< n \vi\ <p logn. Suppose that Zi,i = 1, . . . , n, are such that 

sup E n [(a'Zi) 2 ] < 1, max ||Zj|| < Cm, ra r Cm lo S 22 ™ = 

|| Q ||<1 " i<«<« 

There exists a sequence of zero-mean Gaussian processes G n with a.s. continuous paths, 
that has the same covariance functions as U n , conditional on Z\, . . . , Z n , namely, 

E[G n {u)G n {u')'} = E[U n (u)U n (u')'] = E n [ZiZ[](u An' - tin'), for all u and v! G U, 

and that approximates the empirical process V n , namely, 

sup||U n (u)-G n («)|| <po(^- 
ueu \iogn 

Proof. The proof is based on the use of maximal inequalities and Yurinskii's coupling. 
Throughout the proof all the probability statements are conditional on Z\, . . . , Z n . 

We define the sequence of projections ttj : U — > U, j = 0, 1, 2, . . . , oo by TTj(u) = 2 k ~ l /2^ 
if u G ({2 k — 2)/2i , 2 fe /2 J ), k = 1, . . . ,j, and ^-(u) = n if n = or 1. In what follows, given 
a process G in [£°°(£/)] m and its projection G o ttj, whose paths are step functions with 
2- ? steps, we shall identify the process G o ttj with a random vector G o ttj in M. 2Jm , when 
convenient. Analogously, given a random vector W in R 2Jm we identify it with a process 

in [£°°(U)] m , whose paths are step functions with 2 J steps. 

The following relations proven below: 

(1) (Finite-Dimensional Approximation) 

1 

log n / 

(2) (Coupling with a Normal Vector) there exists N n j =d N(0, var[U n o ttj]) such that, 



ri = sup ||U n (u) - U n o TTj(u)\\ < P o 
ueu 



r 2 = \Wnj - U„ o TTjIla <P o (j^j 



(3) (Embedding a Normal Vector into a Gaussian Process) there exists a Gaussian 
process G n with properties stated in the lemma such that N n j = G n o 7Tj a.s.; 

(4) (Infinite-Dimensional Approximation) 



r 3 = sup ||G n (n) - G n o7r,-(u)|| <p o ( 

u&A \ iog 



.-,(-) 



The result then follows from the triangle inequality 

sup \\V n {u) - G n (u)\\ <n+r 2 + r 3 . 
ueu 

Relation (1) follows from 

ri = sup ||U n (u) - U n o TTj(u)\\ < sup \\U n (u) -U n (u') 
ueU \u-u'\<2-i 



< P y/2-imlogn + J m2ei ° g4re < P o(l/ log n), 
V n 

where the last inequality holds by Lemma [9J and the final rate follows by choosing here 
2 J = (m log 3 n)l n for some £ n — > oo slowly enough. 

Relation (2) follows from the use of Yurinskii's coupling (Pollard [31] . Chapter 10, The- 
orem 10): Let £i,...,£ n be independent p-vectors with E[£i\ = for each i, and with 

K := Yli E [ll£«ll 3 ] finite. Let S = £i H h £ n . For each 5 > there exists a random vector 

T with a N(0, var(S)) distribution such that 

P{\\S - T\\ > 35} < C B (l + |l0S ^ /jB)l ^ where B := K p<T 3 , 

for some universal constant Cq. 

In order to apply the coupling, we collapse ViZ^i o ttj to a p- vector, and let 

£i = ViZiipi o nj G K p , p = 2 J m 

so that U n o 7Tj = Y27=l Then 

2J m 



3/2' 



< 2 3 ^ 2 £[|^|3]E n [||^|| 3 ] <2 3 ^. 



E n £[||&|| 3 ] = En£ |X)Z)^(«*i) 2 «i^ 

i fc = l to = l 

Therefore, by Yurinskii's coupling, since logn < 2 J m, by the choice 2- ? = mlog 3 n, 



<5 3 nV2 



by setting 5 



m log n I . This verifies relation (2) with 



r2 <P 



n 



log n 



1/6 



o(l/ logn), 



provided j is chosen as above. 
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Relation (3) follows from the a.s. embedding of a finite-dimensional random normal 
vector into a path of a continuous Gaussian process, which is possible by Lemma [JTJ 

Relation (4) follows from 

r 3 = sup \\G n (u) - G n o iTj(u)\\ < sup \\G n (u) - G n {u')\\ 

u&U \u-u'\<2~i 



< P a/2 J'mlogn < P o(l/ log n), 
where the last inequality holds by Lemma [10] since by assumption of this lemma, 

sup EE n [vfia'Zi) 2 ] = sup E n [(a'Zi) 2 ] < 1 

||a||<l l|a||<l 

and the rate follows from setting j as above. 

Note that putting bounds together we also get an explicit bound on the approximation 
error: 

sup ||U n («) - G n {u) || <p ^H^n + > 2C - lQg4n + ( V6 , 
ueu V n V n J 

□ 

Next we establish the auxiliary relations (1) and (4) appearing in the preceding proof. 

Lemma 9 (Finite-Dimensional Approximation). Let Z\, . . . , Z n E M m be such that maxj< n \\Z,, 
Cm, and cp = sup|| a || <1 E n [(a'Zj) 2 ], let Vi be i.i.d. random variables such that E[vf] = 1 and 
maxi<j< n \vi\ <p logn, and let ij)i(u) = u — l{Ui < u}, i = 1, . . . ,n, where U\, . . . , U n are 
i.i.d. Uniform(0, 1) random variables. Then, for 7 > 0, andV n (u) = G n (v iZiipi(u)), 



sup \a' (U n (u) -U n (u'))| <p y / ~ftpmlogn + 



m 2 ( m log n 



n 



Proof. For notational convenience let A n := y m '""J" 6 - ■ Using the second maximal in- 
equality of Lemma [TBI with M(m,n) = £ m logn 

e(m,n,7) = sup \a (V n (u) - U n (u')) | 

< P A/mlogn sup \J EE n [vf(a'Zi) 2 (i()i(u) - ^{u')) 2 } + A n . 
Il^ll^ijl 11- n 'l^7 

By the independence between Zj, V{ and C/j, and E[v 2 ] = 1, 



e(m, n, 7) <p a/ V" 77 - l°g n SU P 

|u— u'|<7 
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Since U{ ~ Uniform(0, 1) we have (ipi(u) — ipi(u')) 2 =d (\u — u'\ — l{Ui < \u — u'\}) 2 . Thus, 
since | u — u' | < 7 

e(m,n,-y) < P y 7 (pm log ny/ 7(1 - 7) + A n . 

□ 

Lemma 10 (Infinite-Dimensional Approximation). Let G n : U — > M m be a zero-mean 
Gaussian process whose covariance structure conditional on Z\ , . . . , Z n is given by 

E [G n {u)G n {u')'] = E n [ZiZi](u V v! - uv!) 

for any u, u' G U C (0, 1), where Zi G W 71 , i = 1, . . . , n. Then, for any ]>0k have 

sup \\G n {u) - G n {u')\\ < P a/ <pjm log m 

\u— m'|<7 

w/iere 99 = supi^n^! E„[(a'Zj) 2 ]. 

Proof. We will use the following maximal inequality for Gaussian processes (Proposition 
A. 2. 7 [37]) Let X be a separable zero-mean Gaussian process indexed by a set T. Suppose 
that for some K > a(X) = sup tgT a{X t ), < e < cr(X), we have 

[K\ V 

N(e,T,p) < f-J , for0<£<e , 

where N(e, T, p) is the covering number of T by e-balls with respect to the standard devi- 
ation metric p(t,t') = a(Xt — X t i). Then there exists a universal constant D such that for 
every A > a 2 (X)(l + W)/e 

Ks a ' >a ) s (tI^)) , * (am - y)) ' 

where <!> = 1 — <3?, and $ is the cumulative distribution function of a standard Gaussian 
random variable. 

We apply this result to the zero- mean Gaussian process X n : g™ -1 x IA x U — > R defined 

as 

X Uj t = a'(G n (u) — G n (u')), t = (a,u,u), aeS 1 " -1 , \u — u'\ < 7. 
It follows that sup tgT X n>t = sup| u _ u /|< 7 \\G n (u) - G n (u')\\. 
For the process X ri we have: 

a(X n )< h sup E n [{a' Ztf], K< I sup E n [(a'Z i ) 2 ], and V < m. 

V n«ii<i V" "- 1 

Therefore the result follows by setting A ~ ^^mlogmsupn^i^]^ E n [(a'Zj) 2 ]. □ 
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In what follows, as before, given a process G in [l°°(JA)] m and its projection Gonj, whose 
paths are step functions with 2 J steps, we shall identify the process G o irj with a random 
vector Gonj in M? Jm , when convenient. Analogously, given a random vector W in R 2Jm we 
identify it with a process W in [£°°(U)] m , whose paths are step functions with 2- ? steps. 

Lemma 11. (Construction of a Gaussian Process with a Prescribed Projection) Let Mj be 
a given random vector such that 

^■= d Go7r j =:iV(0,E j ), 

where £j := Var[A/}] and G is a zero-mean Gaussian process in [i^ili)]" 1 whose paths are 
a.s. uniformly continuous with respect to the Euclidian metric \ ■ \ on U. There exists a 
zero-mean Gaussian process in [£°°(U)] m , whose paths are a.s. uniformly continuous with 
respect to the Euclidian metric \ • \ onU, such that 

Mj = Go TTj and G= d G in [£°°(U)] m . 

Proof. Consider a vector G o we for I > j. Then A/} = G o ttj is a subvector of G o ire = Mt- 
Thus, denote the remaining components of Mi as M(\j. We can construct an identically 
distributed copy Mi of Me such that Mj is a subvector of Mi- Indeed, we set Me as a vector 
with components 

Mj and Mg\j, 

arranged in appropriate order, namely that Me o i\j = Mj, where 

Mi\j = Z e \j,j^j}Mj + rjj, 
where r]j±Mj and r]j = d N(0, ^e\j,i\j - S^jS'jSj-^), where 

( ^ ) : var ( ^ ) . 

We then identify the vector Me with a process A/^ in £°°(U), and define the pointwise 
limit G of this process as 

G(u) := lim Mi(u) for each n G Uq, 

e-toc 

where Uq = U^ =1 U^ J =1 is a countable dense subset of U. The pointwise limit exists, since 
by construction of {ire} and Uq, for each u G Wo, we have that 7r^(u) = u for all £ > £(u), 
where £{u) is a sufficiently large constant. 
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By construction Ge = Ge o tt £ = d G o tt£. Therefore, for each e > 0, there exists 77(e) > 
small enough such that 

P\ sup \\G{u) - G(u')\\ > e ) < P I sup sup \\G o 7r fc (u) -Go Tr k (u')\\ > e 

\u,u'€U :\u-u'\<r){e) J \|u-«|<??(e) k J 

<P sup sup ||G o 7Tfc(n) — G o TTk(u')\\ > e 

\\u-u\< V (e) k ' y 

<P( sup \\G(u) -G(u')\\ > e ] < e, 

\|u-«|<»j( e ) / 

where the last display is true because sup| M _ M | <?? \\G(u) — G(u')\\ — > as r\ — > almost surely 
and thus also in probability, by a.s. continuity of sample paths of G. Setting e = 2~ m for 
each 771. G N in the above display, and summing the resulting inequalities over m, we get 
a finite number on the right side. Conclude that by the Borel-Cantelli lemma, for almost 
all co en, \G(u) - G{u')\ < 2~ m for all \u - u'\ < T](2~ m ) for all sufficiently large m. This 
implies that almost all sample paths are uniformly continuous on Uo , and we can extend the 
process by continuity to a process {G(u),u G U} with almost all paths that are uniformly 
continuous. 

In order to show that the law of G is equal to the law of G in £°°(U), it suffices to 
demonstrate that 



E[g(G)]=E[g(G)} for all g : 
We have that 



: \g(z) - < sup \\z(u) - z(u)\\ A 1. 



\E[g{G)\ - E[g(G)}\ < \E[g(G o „ e )] - E[g(G o tt,)]|+ 



+ E 



SUp ||G O 7T^(7i) - G(u)\\ A 1 



+ 



sup ||G o -Ke(u) - G(u)\\ A 1 



+ E 

— >• as t — > 00. 



The first term converges to zero by construction, and the second and third terms converge 
to zero by the dominated convergence theorem and by 



G o Tr e ^ G and G o tt/ -)• G in [^°°(^)] m as £ ->• 00 a.s., 
holding due to a.s. uniform continuity of sample paths of G and G. 



□ 
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Appendix G. Technical Lemmas on Bounding Empirical Errors 



In Appendix IG.2I we establish technical results needed for our main results - uniform 
rates of convergence, uniform linear approximations, and uniform central limit theorem - 
under high-level conditions. In Appendix IG.3I we verify that these conditions are implied 
by the primitive Condition S stated in Section [2j 

G.l. Some Preliminary Lemmas. 

Lemma 12. Under the conditions S.2 and S.5, for any u 6W and a G S m ~ l 

\a'(J m (u) - J m (u))a\ < mT K = o(l), 
where J m (u) = E[f Y[x (Z' P(u)\X)ZZ'}. 

Proof of LemmaUM For any a G S m ~ l 

\a'(J m (u) - J m (u))a\ = E[\f Y{x (Z>P(u) + R(X,u)\X) - f Y \ x {Z' P{u)\X)\{Z' a) 2 } 

< a'T, m a ■ Jm~ K . 

The result follows since S m has bounded eigenvalues, k > and m — y oo as n — y oo. D 

Lemma 13 (Auxiliary Matrix). Under conditions S.1-S.5, for u',u G U we have that 
uniformly over z G Z 



z'{J-\u')-J-\u))U n {u') 



\J u(l - u)z' 'Jmiu^mJ^^Z 



< 



P \u — u'\\frn \ogn 



Proof. Recall that J m {u) = E [fY\x(QY\x( u \X)\X)Z Z'~\ for any u G U. Moreover, under 
S.1-S.5, we have || J~ 1 (it')U n (ii / )|| <p y/ m log n uniformly in u' G U by Lemma [23] and 
Corollary [2] of Appendix iGl 

Using the matrix identity A^ 1 - B^ 1 = B^ 1 (B - A)A~ 1 

Jm( u ') ~ J m l {u) = J- l {u)(J m (u) - J m (u'))J m l {u'). 
Moreover, since \fY\x{QY\x(u\x)\x) - fY\x{QY\x{u'\x)\x)\ < (/'/ ' f)\u - u'\ by Lemmal 
J m (u) - J m {u') 4 (f'/f)\u - u'\T, m , and J m (u') - J m (u) 4 (f'/f)\u - u'\E m 
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where the inequalities are in the semi-definite positive sense. Using these relations and the 
definition of s n (u, x) we obtain 



z'iJ-^u')- J-\u))V n {u>) 



- u)z'J m l {u)Y, m J m 1 (u)* 



z J m (u) 



(J m (u) - J m (u'))J-\v!)V n (v!) 



S n (u,x) 

(f'/f)\ u ~ u'\maxeig(T: m )y/m log n. 

The result follows since /' is bounded above, / is bounded from below, and the eigenvalues 
of Ti m are bounded above and below by constants uniformly in n by S.2 and S.3. □ 

Lemma 14 (Primitive Condition S.2). Under the condition S.2, there are positive constants 
c,C,C'i,C 2 ,C" such that the conditional quantile functions satisfy the following properties 
uniformly over u,u' G U , x £ X , 

(i) c \u — u'\ < \Qy\x( u \ x ) - Qy\x( u '\ x )\ < C\u - u'\; 

(ii) \fY\x{Qv\x{u\x)\x) - f Y \x(QY\x(u'\x)\x)\ < C[\u - u'\; 

(iii) fv\x{y\x) < 1/c and \f Y{x (y\x)\ < C' 2 ; 



iv) £?Qy\x{u\x 



< C". 



Proof. Under S.2, fy\x{'\ x ) is a differentiable function so that Qy\x('\ x ) is twice differen- 
tiable. 

To show the first statement note that -§^Qy\x{ u \ x ) = f Y \ x (Q Y \ x (u\x)\x) ' ^ an a PP nca ^i° n 
of the inverse function theorem. Recall that / = mi xe x,ueu fy\x (Qy\x( u \ x )\ x ) > 0) an d 
su Vx&x ,y&B, fY\x{y\ x ) < /■ This proves the first statement for c = l/f and C = l/f. 



To show the second statement let /' = sup^^ /^f'Y\x{y\ x 

d 



and 



C[ = sup 

xex,ueu 

= sup 



lu fY\x{Qv\x{u\x)\x) 

-^fr\x(y\x) \y=Q Y]x (u\x) ^-Qy\x(u\x) 



V 
< —. 

~ f 



By a Taylor expansion we have 

\fy\x{QY\x(u\x)\x) - fY\x(QY\x{u'\x)\x)\ < C[\u - u'\. 

The second part of the third statement follows with C' 2 = f ■ The first statement was 
already shown in the proof of part (i) . 

For the fourth statement, using the implicit function theorem for second order derivatives 



£?Qy\x{u\x) 



d 2 



"Y\X 



(y\x) 



V=Qy\x(u\X) 



(±Qy|*(«|s)) 



3 f' Y \x^Y\x{u\x)\x) 

P 



f Ylx (QY\x(u\x)\x)' 
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Thus, the statement holds with C" = ]'/ / 3 . 

Under S.2, we can take c, C, C[, C 2 , and C" to be fixed positive constants uniformly 
over n. □ 

G.2. Maximal Inequalities. In this section we derive maximal inequalities that are needed 
for verifying the preliminary high-level conditions. These inequalities rely mainly on uni- 
form entropy bounds and VC classes of functions. In what follows F denotes a class of 
functions whose envelope is F. Recall that for a probability measure Q with ||-F||qp > 0, 
N(e\\F\\Q tP ,F, L p (Q)) denotes the covering number under the specified metric (i.e., the 
minimum number of L p (Q)-b&lls of radius £||-F||q i? , needed to cover F). We refer to Dudley 
|17j for the details of the definitions. 

Suppose that we have the following upper bound on the L 2 {P) covering numbers for F: 

N(e\\F\\p t 2,F,L 2 (P)) < n(e,F,P) for each e > 0, 

where n(e, F, P) is increasing in 1/e, and e-y/log n(e, F, P) — ?■ as 1/e — >• oo and is decreasing 
inl/e. Let p(F, P) := supj e j- ||/||p,2/||-Fl|p,2- Let us call a threshold function x : M n h-» M k- 
sub-exchangeable if, for any v,w 6 K n and any vectors v, w created by the pairwise exchange 
of the components in v with components in w, we have that x(v) V x(w) > [x(v) V x{w)]/k. 
Several functions satisfy this property, in particular x(v) = \\v\\ with k = y/2 and constant 
functions with k = 1. 

Lemma 15 (Exponential inequality for separable empirical process). Consider a separable 
empirical process G n 

(f) = n- 1 ' 2 Eti{f( z i)- E [f(^)}} and the empirical measure P n for 
Z\, . . . , Z n , an underlying independent data sequence. Let K > 1 and r G (0, 1) be constants, 
and e n (F,F n ) = e n (F, Z\, . . . , Z n ) be a k- sub -exchangeable random variable, such that 

/ Vlogn(e,7",P„)de < e n (F,¥ n ) and sup vaij>f < -(4:kcKe n (F,F n )y 

Jo fer 1 



.2 



'o feT 
for some universal constant c > 1, then 



sup|G„(/)| >4kcKe n (F, 




/ e- l n(e,F,F n )^ K ~^de 

Jo 



A 1 +r. 



Proof. See [6], Lemma 18 and note that the proof does not use that Zj's are i.i.d., only 
independent which was the requirement of Lemma 17 of [6]. □ 



The next lemma establishes a new maximal inequality which will be used in the following 
sections. 
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Lemma 16. Suppose for all large m and all < e < eq 

n(e, T m ,P)< (u/e) J ^ 2 and n(e, T 2 W P) < {u/e) J ^ 2 , (G.35) 

for some ui such that log oj < logn, and let F m = supj g jr m |/| denote the envelope function 
associated with J- m . 

1. (A Maximal Inequality Based on Entropy and Moments) Then, as n grows we have 
sup |G n (/)| < P J{m) ( sup E[f] + n- 1 ' 2 J(m) log 1/2 n ( sup E„[/ 4 ] V E[f 4 ]\ ) log 1/2 n. 

2. (A Maximal Inequality Based on Entropy, Moments, and Extremum) Suppose that 
F m < M(m, n) with probability going to 1 as n grows. Then as n grows we have 

sup |G„(/)| < P J(m) ( sup E[f] +n- 1 J{m) 2 M 2 {m,n)\ogn\ log 1/2 n. 



Proof. We divide the proof into steps. Step 1 is the main argument, Step 2 is an application 
of Lemma [T5l and Step 3 contains some auxiliary calculations. 

Step 1. (Main Argument) Proof of Part 1. By Step 2 below, which invokes Lemma [T5l 
sup |G„(/)| < P J(m)^[n~ sup (E n [/ 2 ] 1/2 V Elf 2 ] 1 / 2 ). (G.36) 

We can assume that supj g j- m E^f/ 2 ] 1 / 2 > supj e jr m Elf 2 ] 1 / 2 throughout the proof otherwise 
we are done with both bounds. 

Again by Step 2 and ()G.35|) applied to J 7 ^, we also have 
sup |E n [f 2 ] - E [f 2 ] | = rT 1 ' 2 sup |G n (/ 2 )| 

< P n- l / 2 J(m)^^ sup (E n [/ 4 ]V£[/ 4 ]) 1/2 . 



(G.37) 



Thus we have 



sup E n [/ 2 ] < P sup E [f 2 ] + n- 1 ' 2 J(m) log 1 / 2 n sup (E n [/ 4 ] V S[/ 4 ]) V ' . (G.38) 

Therefore, inserting the bounds (|G.38P in equation (|G.36P yields the result. 

Proof of Part 2. One more time we can assume that supj g jr m E n [/ 2 ] > supj g jr m E[f 2 } 
otherwise we are done. By (|G.37p we have 

sup /e ^ m |E n [f 2 ]~E[f 2 ]\ < P n-V2j( m ) log 1 / 2 nsup /e ^ m (E n [/ 4 ] V E[f]) 1/2 

< P n- 1 / 2 J(m) log 1 / 2 nM(m, n) sup /e ^ m E n [f 2 ] 1/2 
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where we used that / 4 < / 2 M 2 (m, n) with probability going to 1. Since for positive numbers 
a, c, and x, x < a + c| x| 1//2 implies that x < 4a + 4c 2 we conclude 

sup E n [f 2 ] < P sup E [f 2 ] + n _1 J(m) 2 M 2 (m,n) log n. 

Inserting the bound in equation (|G.36P gives the result. 

Step 2. (Applying Lemma [T5|) We apply Lemma[T5lto F m with r m = l/(4J(m) 2 [-fT 2 — 1]) 
for some large constant K to be set later, and 

e n (T m ,¥ n ) = J(m)7ioi^ ( sup E^/ 2 ] 1 / 2 V E[f 2 } 1 / 2 J 

assuming that n is sufficiently large (i.e., n > w). We observe that by (|G.35j) . the bound 
e i — y n(e, .F m ,P n ) satisfies the monotonicity hypotheses of Lemma 1151 Next note that 
en(F m ,F n ) is \/2-sub-exchangeable, because sup^ g jr m ||/||p nj 2 is "v/2-sub-exchangeable, and 

p(J r m ,P n ) ■= SUp /6j r m ||/||p n ,2/||i ? m||p n ,2 > b Y Ste P 3 below. Thus, 

/•p(-F ro ,P«)/4 rp{T m ,W n )/4 

\\F m \\w n ,2 j ^/logn(e,J 7 m ,P)de < ||i ? m ||p n . 2 / J(m)y/\og(ui/e)de 

Jo Jo 

< J(m)0og(n Vw) sup ||/||p n , 2 /2 

5; Cn(-^" mjPn)> 

which follows by J Q P Vlog(w/e)de < (J Q P lde) 1/2 (J P log(w/e)de) 1/2 < pV 21 °g(" Vw), for 
l/y/n<p<l. 

Let i^T > 1 be sufficiently large (to be set below). Recall that 4\/2c > 4 where c > 1 is 
universal. Note that for any / G T m , by Chebyshev inequality 

P(|G n (/)| > 4v / 2cKe n (J m ,P„) ) < - S ^ f T ^l f h\,o < T. 7k „J t , < T m/2. 



(4V2cKe n (F m ,F n )) 2 ~ (l^cKf J(m) 2 log n 
By Lemma [T5l with our choice of r m , w > 1, and /o(J>n,Pn) < 1, 



»{ sup |G n (/)| > 4V2cKe n (T m ,F n )\ < — (^{u/ef- W\&-Ade + r„ 

1 fe-F™ J T mJo 



4 (l/[2u]) J( - m ^ K2 -^ 

-T m J(m) 2 [K 2 -l] +Tm ' 



which can be made arbitrary small by choosing K sufficiently large (and recalling that 
Tm — > as K grows). 
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Step 3. (Auxiliary calculations.) To establish that sup /gj- m ||/||p n ,2 is \/2-sub-exchangeable, 
define Z and Y by exchanging any components in Z with corresponding components in Y. 
Then 

V2(sup ||/||p n( ^ 2 V sup ||/|| Pn(f))2 ) > ( sup ||/||; b( ^ 2 + sup imij n(n2 ) 1/2 

> ( sup E n [/(Z,) 2 ] +E n [/(y 4 ) 2 ]) 1/2 = ( sup E n [/(Z,) 2 ] +E n [/(l-) 2 ]) 1 /2 

/e.F m /e.F m 

> ( sup ||/||p n(z) o V sup ll/Hl r ) 1 /2 = sup ||j|| p v sup [I/Hp (y) 2 . 

/e^ m /&F m /&F m feTm 

Next we show that p(F m ,¥ n ) := sup /gJ r m ||/||p„,2/||-Fm||p n ,2 > 1/y/n. The latter bound 
follows from E„ [F%\ = E n [sup /eJ r m \ f(Zi)\ 2 } < supj< n sup /eJ r m \f(Zi)\ 2 , and from the in- 
equality sup /gJm E n [|/(Zi)| 2 ] > sup /&Fro supi< n |/(Zj)| 2 /n. □ 

The last technical lemma in this section bounds the uniform entropy for VC classes of 
functions (we refer Dudley [T7] for formal definitions). 

Lemma 17 (Uniform Entropy of VC classes). Suppose T has VC index V , as e > goes 
to zero we have for J = 0(V) 

supN(e\\F\\ Q>2 ,F,L 2 (Q)) <(l/e) J 
Q 

where Q ranges over all discrete probabilities measures. 

Proof. Being a VC class of index V, by Theorem 2.6.7 in [37j we have that the bound 
supg log JV^eH-FHq^, J 7 , L%(Q)) < Vlog(l/e) holds for e sufficiently small (also making the 
expression bigger than 1). □ 

Comment G.l. Although the product of two VC classes of functions may not be a VC 
class, if T has VC index V, the square of T is still a VC class whose VC index is at most 
2V. 

G.3. Bounds on Various Empirical Errors. In this section we provide probabilistic 
bounds for the error terms under the primitive Condition S. Our results rely on empirical 
processes techniques. In particular, they rely on the maximal inequalities derived in Section 

E2 

We start with a sequence of technical lemmas which are used in the proofs of the lemmas 
that bound the error terms eo — £6- 





{(A,t) 




{(A,t) 


u 


{(A,t) 


u 


{(A,t) 


u 


{(Ai,t) 
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Lemma 18. Let r = o(l). The class of functions 

T m ,n = W(MP,u)-ipi(f3(u),u)) : ueU,\\a\\ < - (3(u)\\ < r} 
has VC index of 0(m). 

Proof Consider the classes W := {Z[a : a G M m } and V := < : /3 G M m } 

(for convenience let Ai = (Zi,Yi)). Their VC index is bounded by m + 2. Next consider 
/ G J m , n which can be written in the form f(Ai) := g(Ai)(l{/i(yli) < 0} - l{p(Ai) < 0}) 
where g G W, l{/i < 0} and l{p < 0} G V. 

{(^,t):/(^)<t} = {(4t):g(WW<0}-lMA;)<0})<i} 

> 0,p(Ai) > 0,t > 0}U 
< 0,p{Ai) <0,t> 0}U 
/1(A) < 0, P {A t ) > 0,g{A t ) < t}U 
h{Ai) > d,p{A t ) <0,-g{Ai) <t}. 

Since each one of the sets can be written as three intersections of basic sets, it follows that 
F m , n has VC index at most 0(m). □ 

Lemma 19. The class of functions 

n m , n = {l{\Yi - Z[P\ < h^a'Z,) 2 : ||/3 - f3{u)\\ < r, h G (0,H],a G S™" 1 } 
has VC index of 0(m). 

Proof. The proof is similar to that of Lemma [THJ □ 

Lemma 20. The family of functions Q m ^ n = {a'ipi(/3(u),u) : u G U,a G S"™ -1 } has VC 
index of 0(m). 

Proof. The proof is similar to the proof of Lemma [TH1 □ 

Lemma 21. The family of functions 

An >m = {a'Z (1{Y < Z'f3(u) + R(X, u)} - 1{Y < Z'(3(u)}) : a G S m ~ x , u£U} 
has VC index of 0(m). 

Proof. The key observation is that the function Z' (3{u) + R(X, u) is monotone in u so that 

{1{Y < Z'/3(u) + R(X, u)} : u G U] 

has VC index of 1 and that {1{V < Z'(3(u)} : u G hi} C {1{Y < Z'0 : G R m }. The proof 
then follows similarly to Lemma [THl □ 
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Consider the maximum between the maximum eigenvalue associated with the empirical 
Gram matrix and the population Gram matrix 

4> n = max E n [{a'Zi) 2 ] V E [{a'Zi) 2 } . (G.39) 

The factor <j) n will be used to bound the quantities eo and e\ in the analysis for the rate 
of convergence. Next we state a result due to Guedon and Rudelson [18] specialized to our 
framework. 

Theorem 13 (Guedon and Rudelson [IB])- Let Zi £ M. m , i = 1, . . . ,n, be random vectors 
such that 

2 logn E [maxj< n \\Zi 



n max E \(Z'a) 2 ] 



we have 



< i 



E 



max 



1 



a) 2 } 



n 

i=l 



< 25 • max E \(Z. 
o, 6 5m-i LV 



'a) 2 ] 



Corollary 2. Under Condition S and £ m logra = o(n), for X max = max^^m-i E [(Z- 
we have that for n large enough <p n as defined in ^G.39\) satisfies 



,21 



E [<>,,} < | 1 + 24/ J X max and P{cj> n > 2X max ) < 2 J 

y Ti>A max I y n\ max 

Proof. Let 5 be defined as in Theoremll3[ Next note that E [max^i^.^ ||-^j|| 2 ] ^ Cm under 
S.4, and X max < 1 and X max > 1 under S.3. Therefore, 5 2 < (C m log n)/n in Theorem PT3l 
The growth condition £ m log n = o(n) yields 5 < 1 as n grows. 

The first result follows by applying Theorem [13] and the triangle inequality. 

To show the second relation note that the event {4> n > 2X max } cannot occur if <p n = 
max a6iS m-i E[(Z[a) 2 } = X max . Thus 

P((p n > 2X max ) = P(max ag5 m-i E n [(Z-a) 2 ] > 2X max ) 

< P( max |E n [(Z» 2 ] - E [{Z[a) 2 \ \ > X max ) 

<E[ ma Xi \E n [(Z' ia ) 2 }-E[(Z' ia ) 2 ]\]/X max 

< 26, 

by the triangle inequality, the Markov inequality and Theorem [T3j □ 



Next we proceed to bound the various approximation errors terms. 
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Lemma 22 (Controlling error ei). Under conditions S.1-S.4 we have 



ei(m,n) < P y/m\ogn 4> n . 

Proof. Consider the class of functions F m n defined in Lemma [18] so that 

e 1 (m,n)= sup |G n (/)|. 

,71 

From Lemma [T71 we have that J(m) < ^Jm. By Step 2 of Lemma [TBI see equation (IG.36j) . 



sup |G n (/)| < P y^bg^ sup (E„[/ 2 ]V^[/ 2 ]) ' (G.40) 

f^3~ 771, n f£J~ m,n 

The score function ipi(-,-) satisfies the following inequality for any a G S rn ~ l 

\(iPi(/3,u) -^i{P(u),u))'a\ = \a'Zi\ \\{m < Z[p} - l{ Vl < Z[p{u)}\ < \ol Z t \. 



Therefore 



E„[/ 2 ] < E n [|a'Z i | 2 ] < <p n and E[f 2 ] < E 



1 n 

n ^ — ^ 



n 

i=l 



< </>n (G.41) 



by definition (|G.39p . Combining (|G.4ip with (|G.40p we obtain the result. □ 
Lemma 23 (Controlling error eo and Pivotal process norm). Under the conditions S.l-S.4 
eo(m,n) <p \J m log n cj) n and sup ||U n (u)|| <p a/ to log n <p n . 

u£U 

Proof. For eo , the proof is similar to the proof of Lemma [22] and relying on Lemma [20] and 
noting that for g £ G m , n 

E n [g 2 } = E n [(a'^(f3(u),u)) 2 } = E n [(a' Z,,) 2 (1{ W < Z[p(u)} - u) 2 } < E n [(a'Z i ) 2 ] < <f> n . 

Similarly, E[g 2 } < <f> n . 

The second relation follows similarly. □ 

Lemma 24 (Bounds on ex(m,n) and €2(171, n)). Under conditions S.l-S.4 we have 

ei (m, n) < P J m( m r log re + m< j^ log n and e 2 (rre, n) <p Vn( m r 2 + \pnmT K r. 

\ n 



Proof. Part 1 (ei(m,n)) The first bound will follow from the application of the maximal 
inequality derived in Lemma [TBI 

For any B > 0, define the class of functions T nm as in Lemma [18] which characterize e\. 
By the (second) maximal inequality in Lemma [TBI 

ei(m,n)= sup |G„(/)| < P J(m) ( sup £ [f 2 ] + n" 1 J{m) 2 M 2 (m, n) \ogn\ log 1/2 n, 
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where J(m) < ^Jrn by the VC dimensions of F n)m being of 0(m), Lemma [TU and 
M(m,n) = maxi<j<„ ||Zj|| < ( m . The bound stated in the lemma holds provided we 
can show that 

sup E [f] < r( m . (G.42) 

The score function ipi(-,-) satisfies the following inequality for any a £ S 171 ^ 1 
\{i>0,u)-^i{t3{u),u))' a\ = \Z[a\\\{Yi < Z^} - l{Yi < Ztf(u)}\ 

< \Z[a\ ■ l{\Yi - Ztf(u)\ < WiP ~ P{u))\}. 

Thus, 

E[{ a '{^,u)-^{u),u))f\ < E[J\Z'a\ 2 l{\Y-Z'(3(u)\ < \Z'(f3 - f3(u))\}f Ylz (y\Z)dy] 

< E [{Z'af ■ min(2/|Z'(/3 - (3(u))\, 1)] 

< 2/H/3 - p(u)\\ sup aeSm -i iieSm -i E [|Z 4 'a| 2 |Zh|] 
= 2j\\f3~/3(u)\\ S n PaeSm - 1 E[\Z' l a\ 3 ], 

where sup a65 m-i E^Z-al 3 ] < ( m by S.3 and S.4. 
Therefore, we have the upper bound (|G.42p . 

Part 2 (€2(171, n)) To show the bound for €2, note that by Lemma [T2l for any a £ S 171-1 
V^\a(J m (u) - J m (u))(P - P(u))\ < ^m~ K r. 
For a e S m ~ l and J m (u) = E[f Y \x(Z'f3(u)\X)ZZ'}, define 

e 2 (m, n, a) = n^V (E[ij>(Y, Z, (3, u)] - E[iP(Y, Z, «)]) - a'J m (u)((3 - (3(u))\. 



Thus e 2 (m, n) < sup aeSm -i t{Ut p )eRn m e 2 (m, n, a) + s/nm R r. 

Note that since E[a'ipi(f3(u),u)] = 0, for some (3 in the line segment between f3(u) and 
P, E[a'iPi(P, U )} = E[f Y \ x (Z0\X)(Zia)Zi](fi - P(u)). Thus, using that \fy lx (Z' f3\X) - 
f Y \ x (Z'f3(u)\X)\ < f\Z'0 - p(u))\ < T\Z'(fi - p(u))\, 
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e a (m, n, a) = n 1 ' 2 \e [{a' Z){f Y \ x {Z' P\X) - f Ylx (Z' P(u)\X))Z'} (0 - p(u)) 

<n 1 ' 2 E[\a'Z\ \Z'(P-P(u))\ 2 T] 
<n^ 2 7\\P-P(u)\\ 2 sup E[\a'Z\% 
where sup^^m-i E[\Z' ia \ 3 ] <UbyS.3 and S.4. 

□ 

Lemma 25 (Bounds on Approximation Error for Uniform Linear Approximation). Under 
Condition S, 

1 n 

r u -.= -= VZi(i{y< < 4%) + R(x uU )} - < Z'^{u)}), uGU, 



satisfies 



sup \ar n (u)\ <p min { a/to, 1 k log n + ^ mm ^S ra ^ J n( p nm nyl _ 



Proof. The second bound follows by Cauchy-Schwarz, \R(Xi,u)\ < m K , and bounded 
conditional probability density function of 1" given X. 

The proof of the first bound is similar to the bound on e\ in Lemma [Ml it will also follow 
from the application of the maximal inequality derived in Lemma [TBI 

Define the class of functions 

An >m = {a'Z (1{Y < Z'(3(u) + R(X, it)} - 1{Y < Z'/3(«)}) : a G 5 m_1 , u G W}. 

By the (second) maximal inequality in Lemma 

sup \a'r n (u)\ = sup |G„(/)I 
aes ra - 1 ,«eu feA n ,m 

< P J(m) ( sup £ [/ 2 ] +n- 1 J(r7i) 2 M 2 (m,n)log?i ) log 1/2 n, 

where J(m) < \/m by the VC dimensions of A n ,m being of 0(m) by Lemma l2Tj and 
M(m, n) = Cm by S.4. The bound stated in the lemma holds provided we can show that 

sup E [f 2 ] < m-\ (G.43) 
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For any / G A n ,m 

\f(Y t ,Z t ,X t )\ = \Z[a\\l{Y t < Z[f3{u) + R{X t ,u)} - l{Y t < Z[P{u)}\ 

< \Z' i a\-\{\Y l -Z^(u)\< \R(Xi,u)\}. 
Thus, since \f Y \x(v\X)\ < /", 

E[p(Y 7 Z,X)] < E[J\Z'a\ 2 l{\y-Z'(3(u)\ < \R(X,u)\}f Y{x (y\X)dy] 



< 



E [{Z'af ■ min(2fR(X, u), 1)] 



< 2fm-« S up aeSm - 1 E[\Z' i a\ 2 ], 
where sup ogS m-i E\\Z[a\ 2 ] < 1 by invoking S.3. Therefore, we have the upper bound (|G.43p . 

□ 

Lemma 26 (Bound on €3(771,71)). Let j3{u) be a solution to the perturbed QR problem 

P{u) G arg mm E n [p u (y, - Z{&)\ + An{u)'p. 
If the data are in general position, 



63(777,77) = su-pn 1/2 \\E n [ipi0(u),u)} +A n (u)\\ <min<^ —=Cm, 4>nVm 



holds with probability 1. 

Proof. Note that the dual problem associated with the perturbed QR problem is 

max E n [liOi] : E n [Zjaj] = -A n (u). 

(u—l)<ai<u 

Letting a(u) denote the solution for the dual problem above, and letting <n{P{u)) := (it 
< Z'iP(u)}), by the triangle inequality 



63(777,77) < sup y/n E n {Z' i a){a i (p{u)) -ai{u)) 
||a||<i,uew L 



+ 



+ SU P«SW V^Pn [ZiOi{u)] +A n (u)\\. 
By dual feasibility E n [Ziai(u)] = —A n {u), and the second term is identically equal to zero. 

We note that ai((3(u)) 7^ ai(u) only if the ith point is interpolated. Since the data 
are in general position, with probability one the quantile regression interpolates 777 points 
(Z'fliu) = Yi for 777 points for every u G W). 
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Therefore, noting that \a,i(f3(u)) — Oj(it)| < 1 



€3(777,77) < sup y/nJE n [{Z^a) 2 ] JE n {aj(/3(u)) - Oj(«)} 
|la|l<l,teeW V V L 



and, with probability 1, 

€3(771, n) < sup y/nE Ti 
[|a[|<i,«eW 



l{a»(/3(tt)) ^a,i(u)} max \\Zi 
KKn 



- m 117 II 

< — — max Zj . 

'n l<i<n 



Lemma 27 (Bound on €4(777,77)). Under conditions S.l — SA, and £^ log 77 = o(n) 



<*K«) <p \/^^=o(1). 



□ 



Proof. The result follows from Theorem [13] of Guedon and Rudelson [18] under our assump- 
tions. □ 



Lemma 28 (Bounds on £5(771, n) and 66(777,77)). Under S.2, h n = o(l), and r = o(l) ; 



/ \ ^ ( 2 mlogn mCjL , . . _ 

£5{m,n) <p \ 1 —log 77 and e 6 (?77, 77) < m +r(, m + n n . 

Y 77Al n 77/7 n 

Proof. To bound 65 first let 

H m , n = {l{\Yi - Zlp\ <h}(a'Zi) 2 : ||/3 - /3(u) || < r, ft G (0, fl], a G S^ 1 } . 
Then, by Lemma [TBI 



€5(777,77) 



77 



-1/2 



71 



-1/2 



sup |G„(/)| 



1/2 



J(m) sup E \f 2 ] + 77" 1 J(m) 2 M 2 (m, 77) log 77 log 1/2 
h n \f&H m / 



77, 



where J(m) < \/rn by the VC dimensions of % njm being of 0(m) by Lemma [T9l and 
M(m, 77) = maxj< n H mjn: i, where E m ,n,i = ||-^i|| 2 > is the envelope of % m , n in the sample of 
size 77. Therefore the envelope is bounded by maxx<j< n \\Zi\\ 2 . We also get that 

sup E[f 2 } < sup E [l{\Yi - Z[P\ < hnjia'Zi) 4 ] 

< 2fh n sup E[(a'Zi) 4 ]. 

aES™" 1 
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By S.2 / is bounded, and by S.4 



sup E [f 2 ] < 2fCh n sup E [(a'Ztf] < QnK, 



Collecting terms, 



n 



-1/2 



hr, 



m I Crn^n + {jn max ||Zj|| 4 /n) logn ] log 1//2 



Ki<n 



1/2 



' n 



(^mlogn 



nh n 



+ 



m 2 maxi<j< n ||Zj|| 4 



n 



h t . 



n 



log n 



Qn m , 1/2 , mmax 1<i<n \\Zi \\ z 

— — log 7 n H =-= logn. 

nh n nh n 



To show the bound on e§ note that < /' by S.2. Therefore 



E[1{\Y-Z'0\ <h n }(a'Z)^ 



= E 
= E 



(a'Zrf h _ljY\x(Z'P + t\X)dt 
\a'Zf f*" hn f Ylx (Z'(3\X) + tf Y]x (Z'p + t\X)dy 
= 2h n E [f Ylx (Z'(3\X)(a'Z) 2 ]+0(2hlf'E [(Z'a) 2 ]) ' 

by the mean-value theorem. Moreover, we have for any (u, (3) £ R m ,n 

E[f Y \ x {Z'p\X){a'Zf] = E[f Ylx (Z'/3(u)\X)(a'Z) 2 } + 

+E [{f Y]x {Z' p\X) - f Y{x (Z> j3(u)\X))(a> ' Zf\ 
= E[f Y]x (Z'p(u)\X)(a'Z) 2 ]+0(E[f>Z'(p-p(u))(a'Zf]) 
= a'J m {u)a + 0(f'r sup^^ E [\a' Z\ 3 ] ) 
= a'J m (u)a + 0{m- K ) + 0(f'rsu VaeS m-t E [\a'Z\ 3 ] ) 

where the last line follows from Lemma IT3| S.2 and S.5. Since /' is bounded by assumption 
S.2, we obtain e^{m,n) < h n + m~ K + r sup aeS m-i E [\a'Z\ 3 ~\. Finally, conditions S.3 and 
S.4 yields sup a65 m-i E [|a'Z| 3 ] < CmSup ag>S m-i E [|c/Z| 2 ] < Cm and the results follows. 

□ 
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